Open Bibliographic Data: The State of Play
March 6th, 2008
Given the public role of libraries and the fact that bibliographic metadata (i.e. the material in library catalogues) doesn’t seem that exciting from a commercial point of view you might think that, of all the types of data out there, it would be bibliographic data that would be the most open. You might even think, given the public-spiritedness of librarians, that this is the kind of area where not only could it be openly available but it would be openly available (in nice little bzip or gzipped dumps …).
In fact the situation is quite the opposite. Most libraries appear to implicitly or explicitly exert rights over their data with some libraries licensing access to their catalogue data for substantial sums of money. The following lists some of the examples (both closed and open) that we know of:
Library of congress: public domain in the US (or at least free) but copyrighted outside the US. See [1] and comments in in fred2.0 readme which state:
These data are works of the United States Government and as such are not subject to copyright within the United States. (17 U.S.C §105).
The Library of Congress has copyrighted these data for use outside the United States. Contact the LC for permission prior to use or distribution of this data outside the United States. [http://www.loc.gov/cds/mds.html]
- fred2.0 (fred2.0 CKAN package): an excellent example of the effort to make material available but unfortunately has same restrictions as Library of Congress (from which the material is sourced).
- British Library: closed (and apparently gets sold for substantial sums).
- OCLC/Worldcat: closed. See the OCLC CKAN page.
- Barton/Simile: semi-open. Sourced from OCLC. Originally taken down but now back under CC non-commercial. See [1] for further discussion.
- OpenLibrary: in theory open (though no formal license or dump as yet and some material may have been sourced from LoC making it suspect outside of the US)
- isbndb.com: not really fully bibliographic data and status uncertain (see isbndb.com CKAN page)
LibraryThing: closed. Does not seem to make data available and source would likely make this problematic (from the about page):
LibraryThing uses Amazon and libraries that provide open access to their collections with the Z39.50 protocol. The protocol is used by a variety of desktop programs, notably bibliographic software like EndNote. LibraryThing appears to be the first mainstream web use.
As we continue to search for open sources of bibliographic data we’d love to hear from anyone who knows of examples not already on this list.
[1] http://www.bookism.org/open/2007/04/02/open-data-what-would-kilgour-think/
Pleiades: Lots of Ancient Geodata Released!
November 12th, 2007
We’ve written about the pleiades project a couple of times before:
Organized by the Ancient World Mapping Center at the University of North Carolina at Chapel Hill, U.S.A., Pleiades brings together a global community of scholars, students and enthusiasts to expand and enhance continually the information originally brought together by the Classical Atlas Project (1988-2000) to support the publication of the Barrington Atlas of the Greek and Roman World (R.J.A. Talbert, ed., Princeton, 2000).
Last month they released the first batch of their data, and what a great job they’re doing. The material is impeccably laid out, in particular:
- They’ve ensured there’s a proper open license on each collection of material (in this case a CC Attribution license)
- They’ve made the material available in bulk as well as through a search facility
More information about the datasets available as well as links can be found on the pleiades site or on the ckan pleiades package page. This really is a perfect example of what an open knowledge project can be and so a big well done to the pleiades team for the work so far (and long may it continue!).
AMEE - an exemplary open service
October 2nd, 2007
The people behind AMEE, the ‘world’s energy meter’ (which we blogged about back in May), have been busy forging ahead into new areas of open service development. As well as ensuring AMEE conforms to the draft Open Service Definition (in short, open data plus open software) they’ve recently published a Memorandum of Understanding with terms and pricing information under a Creative Commons Attribution Sharealike license.
The MOU specifies a broad range of rates - from free to thousands of pounds per month - that vary depending on the size, nature and estimated bandwidth requirements of the client. This is a pioneering example of the compatibility of open standards and commercial viability in web services. The AMEE team has had interest from over 60 different organisations since they launched the platform in June - from Defra to the RSA, from a national energy company to an international investment company. Last week they contracted with Torchbox, their first web agency, and today they announced a partnership with EEDA, the East of England Development Agency.
AMEE’s commitment to “sharing and collaboration” is particularly appropriate in the context of carbon footprinting - where relevant data is held by many different parties. In a screencast the developers succinctly state:
We believe sharing is the key to scaling. That to really, really scale, we need to share as much as we can.
They’ve initiated two threads asking for advice and comments on their licensing and access mechanisms for the AMEE code and data. It would be great if members of the open knowledge community could pitch in with advice!
DBpedia 2.0
September 10th, 2007
DBpedia recently released the new version of their dataset. The project aims to extract structured information from Wikipedia so that this can be queried like a database. On their blog they say:
The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.
As well as improving the quality of the data, the new release includes coordinates for geographical locations and a new classificatory schema based on Wordnet synonym sets. It is also extensively linked with many other open datasets, including: “Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets”.
This is probably one of the largest open data projects currently out there - and it looks like they have done an excellent job at integrating structured data from Wikipedia with data from other sources. (For more on this see the W3C SWEO Linking Open Data project - which exists precisely in order to link more or less open datasets together.)
The Open Library and Versioned Data
August 8th, 2007
The Internet Archive has recently launched a beta version of The Open Library. A demo can be found here and the Open Library book can be read here. It is inspired by the idea of a “library that makes all the published works of humankind available to everyone in the world”. Initially it will consist of a collaboratively built catalogue with some collections of open books (scans of public domain content and works made available under open Creative Commons licenses). The project is being produced under the aegis of Brewster Kahle’s Open Content Alliance and the code part is being developed by Aaron Schwartz.
What’s interesting is that, like the OKF, they want to be able to version data in a ‘wiki’ like manner (they also want all their data to be open). To this end, it appears they have been modding Schwartz’s infogami wiki software to support structured data. We’ve been interested in the Collaborative Development of Data for a while, and have been working on a python ‘versioned domain model’ (vdm) package to allow ‘versioning’ of domain objects (and domain models) in a way similar to the way subversion allows versioning of filestytem trees. The package README includes links to demo code snippets:
http://p.knowledgeforge.net/ckan/svn/vdm/trunk/README.txt
The ‘vdm’ package was used in developing the Comprehensive Knowledge Archive Network and does full revisioning of all data attributes and references:
http://www.ckan.net/
http://www.ckan.net/revision/
Having had a look at the overview of the OL system it looks like it is doing something similar. It’d be fantastic to join efforts and share ideas about this!
AMEE: The Open CO2 Emissions Platform
May 22nd, 2007
One of the highpoints of XTech last week was the presentation of Gavin Starks about AMEE (Avoiding Mass Extinction Engine). AMEE is a “a platform for collaboration on Climate Change and Energy Efficiency”. It combines together a whole bunch of CO2 emissions data (including data from the UK government) with modelling code and assumptions to provide a generic CO2 footprint calculator.
What is particularly exciting about it however, particularly from an open knowledge/data point of view, is that:
- All the code is open (GPL)
- All the data (
apparently ~70TB of it ) is open (CC by-sa) - They’ve provided a nice ‘Knowledge’ API in the form of a RESTful data service
Talking with Gavin at XTech showed just how well he understands the benefit of the open approach, particularly the many-minds principle (’the coolest thing to do with your data will be thought of by someone else’). As he pointed out, a big reason difference between this and other similar (proprietary) projects is that all the data and calculations are there for others to check over and validate.
AMEE is an as close-to-perfect open knowledge exemplar as we are ever likely to get, on one of the most important and compelling topics in the world today. So here’s to Gavin and the rest of the AMEE team for all their work so far.
New Version (v0.4) of Open Economics Released
April 18th, 2007
This is the fourth release of the Open Economics project and the first that has been deemed ‘worthy’ of a full release announcement. The Open Economics project provides data storage and visualization for economics data as well as associated web services and assorted modelling code. The project home page is: http://www.okfn.org/econ/ while the open economics web interface is currently available at: http://www.openeconomics.net/ (though note that we plan to move to a dedicated domain in the near future).
To see some of the features of the web interface in action check out:
- http://www.openeconomics.net/current_value/?year=1900 — value of a pound/dollar from 1900 today
- http://www.openeconomics.net/store/ — data store browser with javascript graphing
Work first started on an Open Economics project back in late 2004 with some basic modelling code. Since then, especially with work over the last year or so, it has expanded considerably to be both a resource in its own right and another experiment into what a knowledge package would look like. At present it consists of 3 components:
- A python library for building economics models
- A set of data (under trunk/data) along with
- A web interface for access the data store, visualizing the data and providing various simple ‘web services’
Finally, we should mention that the project is looking for contributors. Areas in which assistance would be valuable include:
- Uploading and creating data
- Improving code (python)
- Setting up a project blog/website
- Improving web frontend to services and data store
Release Announcement
A new version of Open Economics is now out get it either:
Direct from python package index with easy_install:
$ easy_install econ
From subversion:
$ svn co http://p.knowledgeforge.net/econ/svn/tags/econ-0.4
Changelog
- Change to use pylons (2007-03-31)
- Convert from kid to genshi templates (2007-03-29)
- Current value working again (scipy does not conflict with plain wsgi)
- Several new datasets
- Clean up and improve web user interface
- view ‘action’ gains a limit argument (2007-04-03)
- Improvements to data bundle package (e.g. uuids). (2007-04-05)
About Open Economics
An open set of economics related tools, data and services.
Project home page: http://www.okfn.org/econ/
PlanningAlerts.com — Opening up UK Planning Application Data
April 11th, 2007
Back at the Civic Info forum in November Richard Pope presented his initial work on scraping planning application data from local council websites. This was a classic case where the original providers of the data did not make it available in an open form that was easy to use and reuse (it was often just difficult to even find).
At the time, Richard had only got round to writing a web scraper for a couple of the council’s around where he lived and only redistributed the information in the form of email alerts. However, encouraged by the feedback at the forum a dedicated site was launched in December: http://www.planningalerts.com/. Aided by a growing team of volunteers the project now has more than 1000 subscribers and covers more than 100 local councils.
Even better from an open data point of view is that, while the initial service simply focused on allowing people to sign up for email alerts, the project now makes all of its data openly available both via a web api and as a straight data dump.
This is a wonderful example of creating a new repository of open data by gathering diverse sources together, cleaning them up (standards-based: georss) and then making it all available in an open, accessible and reusable manner. So a big well-done to Richard and all the other contributors to the project.
8.4 Million Grant to University of Manchester to Expand Semi-Open Data Repository
February 27th, 2007
According to a press release yesterday the University of Manchester received a further 8.4 Million GBP of funding from the ESRC to continue and expand its MIMAS service which provides students and researchers with free access to social science data:
The billions of data items managed by the School of Social Sciences and Manchester Information and Associated Services (MIMAS) give researchers access to the census and many national household surveys for free.
They are also a key source for data held by the International Monetary Fund (IMF), World Bank and Organisation for Economic Co-operation and Development (OECD) among others.
…
Keith Cole Deputy Director of MIMAS said: “Tens of thousands of users have used these facilities which support research both nationally and internationally.
“What makes this significant is that whereas many researchers have paid for this sort of information, our work enables them to access it for free.
“The data is free at the end point of use and chimes with the Guardian’s free our data campaign.
“The campaign wants the Government to abandon copyright on essential national data, making it freely available to anyone.”
This is good news and chimes with the comments made by John Sheridan (head of e-services at OPSI) regarding access to government data back at the Civic Information forum we ran in November (John will also be coming to speak at Open Knowledge 1.0).
One note of caution though: this project is not in fact providing Open/Open Access Data. It appears the data will only be free for access and not redistribution or reuse (that is incorporation into other datasets). Furthermore, this access is only free to researchers and students (and likely only those in the UK). Thus, while this change should be warmly welcomed as a step in the right direction there is still a long way to go to achieve truly open data.
