Opening up library records at the Open Library
Open Library is a wiki-editable library catalog, with an open source backend, and a project of the Internet Archive. We like to describe the project as “a web page for every book,” and our vision is that people will one day use Open Library as their first port of call to find out information about books on the web.
The catalog data is available for bulk download, and also accessible through our API. The project was born back in 2007, collecting catalog records from a variety of sources like the Library of Congress, University of Toronto and Amazon.
In May this year, we released a redesign of the site. The main features of the redesign included a UI overhaul from the ground up, a rewritten search engine (SOLR 1.4), new Subject pages (designed for serendipitous browsing of inter-related subject headings), and the introduction of the “Work” concept (an umbrella metadata level used to describe general information about a book that sits apart from it’s publishing history). We had been sitting on an absolute gold mine of data — over 20 million edition records — but it was hard to see the aggregate; the “landscape ” of it. We interlinked records as much as possible (edits to profiles to works to subjects to authors etc) and tried to redesign the the catalog specifically to be bounced around (in addition to searching for something specific, of course).
We paid a lot of attention to making it clearer that this library catalog was alive, and editable, and that everyone can help make corrections or improvements. All contributions are also requested (required?) to be made under CC0. In my mind, the best part of that decision is that it paves the way for ongoing contributions to be shared without restriction, which is a key to future data proliferation.
So far, so good! We’re getting about 140,000 edits per month now, and we ‘ve been very happy with their overall quality. Lots of brand new edition records and covers, and some editors are beginning to shine through in specific areas of interest. We’re also working on smoothing out the proces s of getting records back out of Open Library, having just released a new version of our Author RDF template, with more to come.
One of the other major design decisions we made was, instead of trying to fight over the One True Identifier for a book, that we would begin to collect identifiers. Today, you can hit the Open Library API with ISBN10 /13, LCCN, Internet Archive IDs, or OLIDs, and find records. By adding as many identifiers as we can find to our records, the idea is that Open Library could begin to serve as a sort of “concordance service,” where you could hit it with, say, a Hathi Trust ID, and get a list of all the other IDs Open Library knows about for any one edition. To date, we’ ve written about 4 million Good Reads IDs into the system, and are working right now on a similar volume for LibraryThing.
Interconnection between Open Library records and from them out into the broader web is something we wish to continually enhance. The more connections to a record, the more networked it is, the easier it will be to retrieve into the future.
I must admit, I’m still getting my head around the library world. A friend of the Open Library, Karen Coyle, describes library metadata as “diabolically rational”, so I’m trying to find ways we might be able to humanize it somewhat; to allow for some randomness, and accidental discovery along the way. Turns out loads of people love books, love reading them, and can describe them.
Having such an open system also makes it easy for people external to the project to poke around in it. This is good. We’re beginning to try to collaborate actively too, with nice people at the UC Berkeley iSchool, *openmargin, Librivox, Good Reads, LibraryThing, the New York Times, The Guardian, Wikipedia and a ton of others. While collaboration is always fun, it’s also great to hear when external development Just HappensTM because anyone anywhere in the world can access the guts of Open Library.