Milestone for Open Bibliographic Data: British Library Release 3 Million Records
This release represents a milestone for open bibliography as it represents the first substantial corpus of bibliographic data to be released in an open form by a national library.
As reported in the announcement post:
We have initially received a dataset consisting of approximately 3 million records, which is now available as a CKAN package. This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases. In addition, we are developing sample access methods onto the data, which we will post about later this week.
The data has been loaded into a Virtuoso store that is queriable through the SPARQL Endpoint and the URIs that we have assigned each record use the ORDF software to make them dereferencable, supporting perform content auto-negotiation as well as embedding RDFa in the HTML representation.
The data contains some 3 million individual records and some 173 million triples. Indexing the data was a very CPU intensive process taking approximately three days. Transforming and loading the source data took about five hours.
To get an idea of the shape of the data, let us consider a sample resource, http://bnb.bibliographica.org/entry/GB8102507 …