There have recently been several posts about what features are desirable in government data catalogues.
The Sunlight Foundation recently announced they are planning to build on data.gov to allow “community participation so that people can submit their own data sources” (including support for adding data that is not open such as data with noncommercial restrictions).
They’ve also been working on a list of Data Consumer Requirements – which includes things like:
- Downloadable data sets should be available for regular time periods (i.e., by month, year).
- Proprietary data formats, and non-malleable formats should be avoided wherever possible (i.e., Excel, PDF, etc.).
In addition to data.gov (which was launched back in May), the last few months have seen the launch of several other prominent catalogues for government data, including:
- New Zealand’s Opengovt.org.nz
- .. an attempt to collate the many different datasets available through the New Zealand Government Departments and Local Bodies
- The USA’s IT Dashboard
- The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time.
Many of the issues being discussed are things we’ve thought about in relation to CKAN – our registry of (collections of) open data and open content.
Here are a few suggestions for those building catalogues for (open) government data based on our experience developing CKAN:
- Make the catalogue itself open!
- By using a legal tool such as CC0, the PDDL or the ODbL to make your data catalogue’s metadata open (even if some of the data it describes isn’t), you ensure that the fruits of your hard work can be integrated with that of others! Also, by making the code open source you allow others to re-use and build on it.
- All of CKAN’s code and data is available under an open license – which lets other projects like Infochimps use it.
- Let others download the catalogue data in bulk (not just via an API)
- Create a regular dump of the metadata in your catalogue describing the data – so that your work can be built upon.
- CKAN’s data dump is updated daily.
- Include information on how to get the data, and how it can be used
- In addition to basic details such as title and description, it should be made clear how to get the data, and how it can be used. If it is in the public domain make this explicit (or use a legal tool, such as CC0 or the PDDL). If it is available under the terms of a license – make this explicit and include the text or a link.
- Each entry on CKAN includes a license field, which includes a drop down menu for common open content/data licenses and tools, as well as licenses for Free/Open Source Software. There is also a free text field for any further details.
- Make it versioned!
- If you are going to allow people to add items to or edit the catalogue you might consider making it versioned like a wiki. This allows others to see changes that have been made to each item – which can be useful for reversing and otherwise keeping track of user contributions.
- You can see the history of changes for each item on CKAN. Furthermore the CKAN’s code (and its domain model) are versioned.
What features do you think are important in catalogues for open government data? We’d love to hear what you think!