The following guest post is from Kate Sahota, one of the people involved in the Warwickshire County Council’s Open Data (which we we blogged about last month).

How it all began

It seems the key to triggering a successful open data project is to show the people that matter something shiny, like an iPhone, with a real example of what open data can achieve.

Jim Morton began a small covert operation to start opening up Warwickshire’s data under the guise of developing an iPhone application with news, events, jobs and location information from Warwickshire County Council (WCC). This was launched in January 2010 and by the middle of May 2010 had already been downloaded nearly 2,000 times.

Using the success of the iPhone project and the increasing number of good open data examples (e.g. http://data.gov.uk) we were able to kick-off our own open data project to create opendata.warwickshire.gov.uk. The business case and main benefits driving the project are:

  • Transparency for the public
  • Enhancing public communications
  • Improving service delivery and enabling citizens to self-serve
  • Contributing towards new ways of running public services
  • Improving external contribution to WCC
  • Enabling mash-ups of disparate sources of information to create new ways of looking at information
  • Enabling 3rd sector organisations or individuals to develop applications aggregating data across organisational boundaries
  • Reducing workload in areas like Freedom of Information (FOI), the Observatory and Public Relations
  • Reinforcing our efforts to resolve data and information issues

The Project

Using the Identify, Represent and Expose principles outlined in Jeni Tennison’s blog, a small team of 4: Jim Morton, Steve Woodward and Terry Rich-Whitehead and myself began working on getting information out of the organisation and building a technical solution and set of standards that fitted with our ongoing work to introduce open, non-proprietary standards across our ICT architecture. We were keen to ensure the project made use of cloud technologies to deal with deal with any scaling/demand issues.

The open data site has been written using Ruby on Rails and is hosted on external platform-as-a-service provider Heroku. The database managing the sets of data includes a standard XML schema for the metadata associated with each dataset. It was very important to ensure this schema was aligned to that used by data.gov.uk to enable us to easily extract our metadata for inclusion in their data catalogue. The application will soon be open-sourced to enable other authorities to easily build their own open data site.

We began at the end January 2010 working 2 days a week, and by mid-April we had unofficially launched the site with over a dozen datasets. By the time we officially launched a couple of weeks later there were nearly 30, and this number is growing by the week.

Tips and Tricks

  • Start with quick and easy datasets – the sooner you can get datasets up and build sample applications to demonstrate the purpose and benefits of open data, the more likely you are to encourage other people to give you their data
  • For those reluctant to open up their data, there is one key question you need to ask: “If someone requested this information under FOI, would we have to give it to them?” If their answer is “Yes”, then you have a very strong starting point for persuading them to give you their data
  • Ensure you have a good process for feeding back any issues with the data
  • Use a standard [preferably open! -- ed.] Creative Commons license to cover usage of the data rather than trying to write your own

The Competition

As part of our work to publish the initiative, we are running a “Hack Warwickshire” competition between Monday 17th May 2010 and Friday 25th June 2010. This competition is challenging everyone to come up with new and innovative uses for our data and web services. The winner of the competition will be the proud owner of a brand new Apple iPad.

Warwickshire County Council pinged us earlier this week to let us know about the launch of their new open data site!

warwick-open-data

The site hosts a range of data sets - available in CVS or XML. For example there are details about education in the region, including:

There is also a selection of data on Warwickshire councillors such as the council election results for 4th June 2009, 5th May 2005 and 7th June 2001. There is a blog and a strategy blog associated with the main website giving the latest news on the latest datasets as they are added.

The most recent blog post explains that data will soon be available on areas such as school exclusions, traffic, car parking, council buildings and Warwick County Council finance. There are also plans to allow the site visitors to post notes about the data, make requests for new data or changes, plus a showcase for web sites and applications that make use of the data.

So congratulations to Warwickshire County Council for the new release! We hope other local authorities are encouraged to follow suit.

The following press release is reproduced with permission from Adrian Pohl and Felix Ostrowski, who are both at the North Rhine-Westphalian Library Service Center and who are both members of the Open Knowledge Foundation’s Working Group on Open Bibliographic Data - launched earlier this month. We’ve added a koeln-library-data package to the bibliographic data group on CKAN.

Cologne-based libraries and the Library Centre of Rhineland-Palatinate (LBZ) in cooperation with the North Rhine-Westphalian Library Service Center (hbz) are the first German libraries to adopt the idea of Open Access for bibliographic data by publishing their catalog data for free public use. The University and Public Library of Cologne (USB), the Library of the Academy of Media Arts Cologne, the University Library of the University of Applied Science of Cologne and the LBZ are taking the lead by releasing their data. The Public Library of Cologne has announced to follow shortly. The release of bibliographic data forms a basis for linking that data with data from other domains in the Semantic Web.

Libraries have been involved with the Open Access movement for a long time. The objective of this movement is to provide free access to knowledge to everybody via the internet. Until now, only few libraries have done so with their own data. Rolf Thiele, deputy director of the USB Cologne, states:

Libraries appreciate the Open Access movement because they themselves feel obliged to provide access to knowledge without barriers. Providing this kind of access for bibliographic data, thus applying the idea of Open Access to their own products, has been disregarded until now. Up to this point, it was not possible to download library catalogues as a whole. This will now be possible. We are taking a first step towards a worldwide visibility of library holdings on the internet.

The library of the European Organization for Nuclear Research (CERN) has already published its data under a public domain license in January.

Public data is placed in the public domain The publication of the data enables anybody to download, modify and use it for any purpose. “In times in which publishers and some library organisations see data primarily as a source of capital, it is important to stick up for the traditional duty of libraries and librarians. Libraries have always strived to make large amounts of knowledge accessible to as many people as possible, with the lowest restrictions possible,” said Silke Schomburg, deputy director of the hbz. “Furthermore libraries are funded by the public. And what is publicly financed should be made available to the public without restrictions,” she continued.

Cooperation and data exchangie between libraries have been firmly established in the library world for more than 100 years. Freely supplying bibliographic data should not only further enhance cooperation among libraries but enable subsequent use by non-library institutions. “In the course of the internet’s development it became clear that many services can be greatly enhanced by catalog data. The German Wikipedia for example has been enriched with German National Library data for a long time. Such enrichment is often hindered and constricted by the data’s half open character,” Schomburg notes.

Data for the Semantic Web The North Rhine-Westphalian Library Service Center has recently begun evaluating the possibilities to transform data from library catalogs in such a way that it can become a part of the emerging Semantic Web. The liberalization of bibliographic data provides the legal background to perform this transformation in a cooperative, open, and transparent way. Currently there are discussions with other member libraries of the hbz library network to publish their data. Moreover, “Open Data” and “Semantic Web” are topics that are gaining perception in the international library world.

Further information and links to the published datasets are available at:

Clear Climate Code, and Data

January 28th, 2010

The following guest post is by David Jones who is, among other things, a curator of the climate data group on CKAN (the OKF’s open source registry of open data) and co-founder of Clear Climate Code (which we blogged about back in 2008).

Clear Climate Code have been working on ccc-gistemp, a project to reimplement in clear Python NASA’s GISTEMP. GISTEMP is a global historical temperature analysis, it produces, amongst other things, graphs like this, that tell you whether the Earth is getting warmer or cooler:

Official GISTEMP global anomaly.

Because this graph is important for studying the world’s climate (and determining the signature of global warming), there is a lot of public discussion about where this data comes from. The raw data underlying the graph is surface weather station temperature records. The raw data is processed to produce the data for the graph:

gistemp

The box in the middle, labelled “GISTEMP”, is a process that converts the raw station records into the data for the graph on the right, which is the global temperature anomaly. There are descriptions of this process available, for example Hansen and Lebedeff, 1987. A description is one thing, but it might not tell you everything you need to know. Perhaps the description is sufficiently clear and accurate for you to reproduce the process, perhaps not. The ultimate authority on the process is the source code that implements it, because It’s the source code that is executed in order to produce the processed data. So if you want to know exactly what the process involves, you have to get hold of the source code.

In effect it is the source code that adds value to the raw data to produce processed data. So in a sense, the value of the processed data is embodied in the source code. That’s what makes the source code important.

The source code for GISTEMP is written mostly in Fortran by scientists at NASA, and is available from them. This source code is the working code used by the NASA scientists, it is not necessarily the best source code for explaining how the process works (to an interested and competent member of the general public). There is the question of whether NASA, a publicly funded body, should be paying someone to write code that makes a better tool for communicating with the public (for example by writing better documentation, or writing it in a more exemplary style). I am not going to address that question. The source code NASA use is the source code we have right now.

Our goal at Clear Climate Code is to take this code and produce a new version that is clearer, but does the same thing. We have taken great steps forward towards this goal: We have recently released a version which is all in Python and which reproduces NASA’s results exactly. We think much of this code is already a great deal clearer than the starting material, but we continue to make it clearer. Of course we would welcome your support. If you want to help, please join our mailing list, or you can follow our progress at our blog and on twitter.

The reasons Clear Climate Code chose Python as the implementation language for ccc-gistemp are: accessibility, clarity, and familiarity. By accessible I mean that there is a large community of Python programmers, but also there are several tutorials and other materials for learning Python should you be motivated. Python is used to teach undergraduates programming. Python is relatively clear; it’s deliberately designed to be free of the clutter that imperils other programming languages. It’s certainly possible for people who are not professional programmers to create small programs in Python, and examine and modify existing Python programs. And lastly, it’s familiar; Nick Barnes and I already knew Python when we started the project. This seems like a trivial consideration, but in fact Clear Climate Code is an unpaid project and it’s pretty easy to come up with reasons to do something else instead, so the fact that we already knew Python was important.

Hopefully Clear Climate Code illustrates how both code and data are central to the public understanding of science. For an issue like global warming it is absolutely crucial that public are involved. CKAN’s climate data group is a place where non-specialists can access scientist’s data more easily, and hopefully use it to innovate, do their own hobby science, or create visualisations to better communicate with the public. I’m hoping to add more data sources to the climate data group in the near future, if you’re interested in adding more data to this group, please get in touch.

Data.gov.uk goes public today, and we’ve very proud that it is using CKAN, our open source registry of open data, to list official UK government datasets (as we announced in October):

We’ve been working closely with the Cabinet Office team to get this out the door, and over 2500 datasets have been released via the site!

In the Cabinet Office press release, Sir Tim Berners-Lee says:

Making public data available for re-use is about increasing accountability and transparency and letting people create new, innovative ways of using it. Government data should be a public resource. By releasing it, we can unlock new ideas for delivering public services, help communities and society work better, and let talented entrepreneurs and engineers create new businesses and services.

The new launch has received lots of press coverage - even making the front page of the BBC news website! Below is a selection:

Data.gov.uk

There has recently been a flurry of activity in the Open Street Map community to improve maps of Haiti to assist humanitarian aid organisations responding to the recent earthquake.

In particular mappers and developers are scouring satellite images to identify collapsed and damaged buildings/bridges, spontaneous refugee camps, landslides, blocked roads and other damaged infrastructure - to help NGOs and international organisations respond more effectively to the crisis.

They have issued a call for assistance:

On January 12 2010, a 7.0 earthquake struck Port-au-Prince. The OpenStreetMap community can help the response by tracing Yahoo imagery and other data sources, and collecting existing data sets below. If you have connections with expat Haitian communities, consider getting in touch to work with them to enter place names, etc.

On Wednesday Mikel Maron wrote to the OSM talk list asking for help. Yesterday several companies authorised the OSM community to use their images.

There have been specific requests for up to date mapping information from humanitarian organisations on the ground. For example, on Wednesday, Nicolas Chavent of the Humanitarian OpenStreetMap Team wrote to the OSM talk list:

I am relaying a mapping requirement grounded in Haiti from GIS practitioners mapping there at the United Nations Office of Coordination of Humanitarian Affairs (UNOCHA): “NEED to map any spontaneous camps appearing in the imagery with size in area”

Recently generated data from Open Street Map has been used in maps by ITHACA (Information Technology for Humanitarian Assistance, Cooperation and Action) and the World Food Programme.

Yesterday evening Mikel Maron reported there had been over 400 edits since the earthquake. At the time of writing it looks like this has now more than doubled to over 800 edits since 12th January.

The following two images - before and after the earthquake - give you an impression of how much the OSM community have been doing!

haiti.osm.pre-event

haiti.osm.20090114180900

For more see:

We are seeking an Editor for Open Text Book, one of the highest ranked sites on the web for finding textbooks that you can freely use, reuse and redistribute:

This is a volunteer position requiring a one to two day a month commitment. If you are interested in contributing to the world of open education in general and open text books in particular just get in touch.

Open Text Book Editor


More Information

The Open Knowledge Foundation is looking for an Editor for its Open Text Book project. The project was launched in 2007 after Steve Coast of Open Street Map donated us the domain name. It aims to be a curated one stop shop for open textbooks - that is textbooks anyone is free to access, redistribute, reuse and build upon.

Recently there has been a sharp rise in interest in open textbooks. Earlier this month, a bill was proposed to make all Federally funded textbooks in the State of California available under an open license. Last year saw the start of a student led campaign to make textbooks open - which is currently supported by over 2000 college professors. There now are a plethora of open textbook projects around the world - at different educational levels, for a variety of different subjects. Its an exciting time for open textbooks!

The Open Text Book project aims to be, in the first instance, a simple registry to make it easy to locate open textbooks from many different sources. We have also begun to archive copies of some of the books in a repository. There is plenty of room for expanding the project in the future.

Open Text Book Editor


Responsibilities

We anticipate the Editor will spend one to two days a month on the project. This is a volunteer position and the Editor can be based anywhere in the world. The Editor will be responsible for:

  • Adding new textbooks to the registry on a monthly basis, and curating the repository of mirrored textbooks;
  • Checking the legal status of the textbooks to see that they are compliant with the Open Knowledge Definition;
  • Attending virtual meetings with the Working Group on Open Textbooks;
  • Giving input on the design of the Open Text Book website, and on the future of the Open Text Book project.
Open Text Book Editor


Get in touch!

If you are interested in the position, please get in touch, and let us know:

  • Your name, affiliation, and website (if you have one!)
  • Why you think you’d make a good Editor
  • Your ideas about the future of the Open Text Book project

If you know anyone who you think might be interested to hear about the position - please point them to this post! You can also help spread the word by microblogging the following Identi.ca and Twitter posts:

Open Text Book

OpenFlights

OpenFlights is a site for “flight logging, mapping, stats and sharing”.

We’re very pleased to hear they’ve just released their data under the Open Database License (ODbL):

One of OpenFlights‘ most popular features is our dynamic airport and airline route mapping, and today, we’re proud to release the underlying data in an easy-to-use form, up to date for October 2009. Behold 56749 routes between 3310 airports on 669 airlines spanning the globe.

The data can be downloaded from our Data page and is free to use under the Open Database License.

See also the OpenFlights package on CKAN:

Fields of Gold

Farm Subsidy have recently released a short film called Fields of Gold: Lifting the Veil on Europe’s Farm Subsidies.

The film tells the story of a campaign to open up data about where money from the Common Agricultural Policy goes - from national Freedom of Information requests from the likes of Jack Thurston and Nils Mulvad, to the construction of FarmSubsidy.org, a website which hosts cleaned up and aggregated European CAP data. It looks at the history of European farming policies, as well as news headlines resulting from the disclosure of where money goes - putting the data into context.

Some of it was shot at the European Open Data Summit (you can catch a glimpse of the European Open Data Inventory on CKAN at around 5:45!) - and there is an emphasis on the potential of new forms of collaboration between journalists and data analysts. As an example, it looks at an investigate report by the International Herald Tribune, which built on Farm Subsidy’s findings.

The film discusses the value and importance of making data open. Journalist Brigitte Alfter argues that the public have a right to know where public funds are spent. European policy analyst David Osimo talks about how making data open allows it to be aggregated, analysed and visualised by third parties - which can facilitate richer and more meaningful exploration. Finally the film alludes to Siim Kallas’s broader drive towards transparency in European institutions, and talks about how Farm Subsidy paves the way for more open access to official European datasets.

Farm Subsidy

UK post box by Andrew Dunn

Where is your nearest postbox, and when is the post collected from it? Now you can get open data showing the locations and collection times of over 116,000 postboxes in the UK. You can browse relevant datasets on CKAN at:

The story behind this data reads like an inverted version of The Little Red Hen. Instead of nobody helping out, and nobody eating the bread except the Little Red Hen, numerous people have helped to request, reformat, clean up and add to this data - and now, as its open, anyone can re-use it!

Last year Tom Taylor made a Freedom of Information request using the What Do They Know? service (developed by the good folks at mySociety) resulting in the publication of PDF documents containing information about UK postboxes.

Edward Betts of the Open Library cleaned and re-published this data in Tab Separated Value (TSV) format:

Abi Broom and Peter Chamberlin made further FOI requests using the What Do They Know? service, resulting in the publication of more postbox locations and collection times:

Unfortunately the geographic information provided by the Royal Mail was not very detailed, so Matthew Somerville has developed a service to locate them more accurately:

Help locate unlocated postboxes – the Royal Mail supplied a list of every postbox’s location, but unfortunately, it did not have useful co-ordinates, only postcodes or sub-postcodes and some textual data. So I wrote this site: look up the postboxes near you by entering the first half of your postcode, locate one whose location you know on the map, pick which postbox you’ve located, and submit. The pages also include postbox last collection times, if we know them.

See the uk-locating-postboxes package:

If you live in the UK, you too can help improve the data by confirming the locations of your local postboxes!

Do you know of datasets containing the locations of postboxes in other countries? If so, please let us know by adding them to CKAN, or by leaving a comment below!