Support Us

You are browsing the archive for Bibliographica.

OutOfCopyright.eu makes Public Domain Calculators available for the entire European Union

Theodora Middleton - August 15, 2011 in Bibliographic, Bibliographica, External, Featured Project, Public Domain, Public Domain Works, WG Public Domain

The following guest post is by Maarten Zeinstra from KnowledgeLand. Maarten is a member of the OKF Working Group on the Public Domain.

Works that have fallen into the public domain after their term of copyright protection has elapsed can be freely used by everybody. In theory that means that these works can be reused by anyone for any purpose which includes commercial exploitation. In theory the public domain status increases access to our shared knowledge and culture and encourages economic activities that do not take place as long as works are protected by copyright. In turn the commercial exploitation of public domain works (for example out of copyright books) has the tendency to increase their accessibility.

In practice, however, determining whether a work has passed into the public domain can prove very difficult. This is especially true when attempting to determine the public domain status of content in multiple jurisdictions. As part of the EuropeanaConnect project, Knowledgeland and the Institute for Information Law at the University of Amsterdam have developed public domain calculators to determine whether a certain work or other subject matter vested with copyright or neighbouring rights (related rights) has fallen into the public domain. These public domain calculators have been developed for 30 countries (the European Union plus Switzerland, Iceland & Norway) and are available at www.outofcopyright.eu.

Users can use the calculators (and the underlying research published at outofcopyright.eu) to determine the copyright status of works in all these countries. This is the first time that this question has been structurally researched across all European jurisdictions.

The results of this research of national copyright laws show a complex semi-harmonized field of legislation across Europe that makes it unnecessarily difficult to unlock the cultural, social, and economic potential of works in the public domain. Identification of works as being in the public domain needs be made easier and less resource consuming by simplifying and harmonizing rules of copyright duration and territoriality.

Outofcopyright continues to adjust and refine its calculators. It is also researching how to make calculation possible using large datasets like bibliographica, DBPedia, and the Europeana datasets on cultural objects in Europe.

We encourage everyone interested in the public domain to try the calculators, comment on them and re-use the published research. All research and other material on Outofcopyright is available under the terms of a Creative Commons Attribution-ShareAlike license and the software powering the calculators can be reused under the terms of the EUPL license.

Report from JISC Open Bibliography

Theodora Middleton - July 12, 2011 in Bibliographic, Bibliographica, WG Open Bibliographic Data

The following post is the majority of the final report from our Open Bibliography Working Group‘s collaborative Open Bibliography project with JISC. Further information is available on the original report post Congratulations to all involved on the successful completion of the project!

Bibliographic data has long been understood to contain important information about the large scale structure of scientific disciplines, the influence and impact of various authors and journals. Instead of a relatively small number of privileged data owners being able to manage and control large bibliographic data stores, we want to enable an individual researcher to browse millions of records, view collaboration graphs, submit complex queries, make selections and analyses of data – all on their laptop while commuting to work. The software tools for such easy processing are not yet adequately developed, so for the last year we have been working to improve that: primarily by acquiring open datasets upon which the community can operate, and secondarily by demonstrating what can be done with these open datasets.

Our primary product is Open Bibliographic data

Open Bibliography is a combination of Open Source tools, Open specifications and Open bibliographic data. Bibliographic data is subject to a process of continual creation and replication. The elements of bibliographic data are facts, which in most jurisdictions cannot be copyrighted; there are few technical and legal obstacles to widespread replication of bibliographic records on a massive scale – but there are social limitations: whether individuals and organisations are adequately motivated and able to create and maintain open bibliographic resources.

Open bibliographic datasets

Source Description Availability
Cambridge University Library This dataset consists of MARC 21 output in a single file, comprising around 180000 records. More info… get the data
British Library The British National Bibliography contains about 3 million records – covering every book published in the UK since 1950. More info… get the data
query the data
International Union of Crystallography Crystallographic research journal publications metadata from Acta Cryst E. More info… get the data
query the data
view the data
PubMed Central The PMC Medline dataset contains about 19 million records, representing roughly 98% of PMC publications. More info… get the data
view the data

Open bibliographic principles

In working towards acquiring these open bibliographic datasets, we have clarified the key principles of open bibliographic data and set them out for others to reference and endorse. We have already collected over 100 endorsements, and we continue to promote these principles within the community. Anyone battling with issues surrounding access to bibliographic data can use these principles and the endorsements supporting them to leverage arguments in favour of open access to such metadata.

Products demonstrating the value of Open Bibliography

OpenBiblio / Bibliographica

Bibliographica is an open catalogue of books with integrated bibliography tools for example to allow you to create your own collections and work with Wikipedia. Search our instance to find metadata about anything in the British National Bibliography. More information is available about the collections tool and the Wikipedia tool.

Bibliographica runs on the open source openbiblio software, which is designed for others to use – so you can deploy your own bibliography service and create open collections. Other significant features include native RDF linked data support, queryable SOLR indexing and a variety of data output formats.

Visualising bibliographic data

Traditionally, bibliographic records have been seen as a management tool for physical and electronic collections, whether institutional or personal. In bulk, however, they are much richer than that because they can be linked, without violation of rights, to a variety of other information. The primary objective axes are:

  • Authors. As well as using individual authors as nodes in a bibliographic map, we can create co-occurrence of authors (collaborations).
  • Authors’ affiliation. Most bibliographic references will now allow direct or indirect identification of the authors’ affiliation, especially the employing institution. We can use heuristics to determine where the bulk of the work might have been done (e.g. first authorship, commonality of themes in related papers etc. Disambiguation of institutions is generally much easier than for authors, as there is a smaller number and there are also high-quality sites on the web (e.g. wikipedia for universities). In general therefore, we can geo-locate all the components of a bibliographic record.
  • Time. The time of publication is well-recorded and although this may not always indicate when the work was done, the pressure of modern science indicates that in many cases bibliography provides a fairly accurate snapshot of current research (i.e. with a delay of perhaps one year).
  • Subject. Although we cannot rely on access to abstracts (most are closed), the title is Open and in many subjects gives high precision and recall. Currently, our best examples are in infectious diseases, where terms such as malaria, plasmodium etc. are regularly and consistently used.

With these components, it is possible to create a living map of scholarship, and we show three examples carried out with our bibliographic sets.

This is a geo-temporal bibliography from the full Medline dataset. Bibliographic records have been extracted by year and geo-spatial co-ordinates located on a grid. The frequency of publications in each grid square is represented by vertical bars. (Note: Only a proportion of the entries in the full dataset have been used and readers should not draw serious conclusions from this prototype). (A demonstration screencast is available at http://vimeo.com/benosteen/medline; the full interactive resource is accessible with Firefox 4 or Google Chrome, at http://benosteen.com/globe.)

This example shows a citation map of papers recursively referencing Wakefield’s paper on the adverse effects of MMR vaccination. A full analysis requires not just the act of citation but the sentiment, and initial inspection shows that the immediate papers had a negative sentiment i.e. were critical of the paper. Wakefield’s paper was eventually withdrawn but the other papers in the map still exist. It should be noted that recursive citation can often build a false sense of value for a distantly-cited object.

This is a geo-temporal bibliographic map for crystallography. The IUCr’s Open Access articles are an excellent resource as their bibliography is well-defined and the authors and affiliations well-identified. The records are plotted here on an interactive map where a slider determines the current timeslice and plots each week’s publications on a map of the world. Each publication is linked back to the original article. (The full interactive resource is available at .)

These visualisations show independent publications, but when the semantic facets on the data have been extracted it will be straightforward to aggregate by region, by date and to create linkages between locations.

Open bibliography for Science, Technology and Medicine

We have made further efforts to advocate for open bibliographic data by writing a paper on the subject of Open Bibliography for Science, Technology and Medicine. In addition to submitting for publication to a journal, we have
made the paper available
as a prototype of the tools we are now developing. Although somewhat subsequent to the main development of this project, these examples show where this work is taking us – with large collections available, and agreement on what to expect in terms of open bibliographic data, we can now support the individual user in new ways.

Uses in the wider community

Demonstrating further applications of our main product, we have identified other projects making use of the data we have made available. These act as demonstrations for how others could make use of open bibliographic data and the tools we (or others) have developed on top of them.

Public Domain Works is an open registry of artistic works that are in the public domain. It was originally created with a focus on sound recordings (and their underlying compositions) because a term extension for sound recordings was being considered in the EU. However, it now aims to cover all types of cultural works, and the British National Bibliography data queryable via provides an exemplar for books. The Public Domain Works team have built on our project output to create another useful resource for the community – which could not exist without both the open bibliographic data and the software to make use of it.

The Bruce at Brunel project was also able to make use of the output of the JISC Open Bibliography project; in their work to develop faceted browse for reporting, they required large quality datasets to operate on, and we were able to provide the open Medline dataset for this purpose. This is a clear advantage for having such open data, in that it informs further developments elsewhere. Additionally, in sharing these datasets we can receive feedback on the usefulness of the conversions we provide.

A further example involves the OKF Open Data in Science working group; Jenny Molloy is organising a hackathon as part of the SWAT4LS conference in December 2011, with the aim of generating open research reports using bibliographic data from PubMedCentral, focussing on malaria research. It is designed to demonstrate what can be done with open data, and this example highlights the concept of targeted bibliographic collections: essentially, reading lists of all the relevant publications on a particular topic. With open access to the bibliographic metadata, we can create and share these easily, and as required.

Additionally, with easy access to such useful datasets comes serendipitous development of useful tools. For example, one of our project team developed a simple tool over the course of a weekend for displaying relevant reading lists for events at the Edinburgh International Science Festival. This again demonstrates what can be done if only the key ingredient – the data – is openly available, discoverable and searchable.

Benefits of Open Bibliography products

Anyone with a vested interest in research and publication can benefit from these open data and open software products – academic researchers from students through to professors, as well as academic administrators and software developers, are better served by having open access to the metadata that helps describe and map the environments in which they operate. The key reasons and use cases which motivate our commitment to open bibliography are:

  1. Access to Information. Open Bibliography empowers and encourages individuals and organisations of various sizes to contribute, edit, improve, link to and enhance the value of public domain bibliographic records.
  2. Error detection and correction. Community supporting the practice of Open Bibliography will rapidly add means of checking and validating the quality of open bibliographic data.
  3. Publication of small bibliographic datasets. It is common for individuals, departments and organisations to provide definitive lists of bibliographic records.
  4. Merging bibliographic collections. With open data, we can enable referencing and linking of records between collections.
  5. A bibliographic node in the Linked Open Data cloud. Communities can add their own linked and annotated bibliographic material to an open LOD cloud.
  6. Collaboration with other bibliographic organisations. Reference manager and identifier systems such as Zotero, Mendeley, CrossRef, and academic libraries and library organisations.
  7. Mapping scholarly research and activity. Open Bibliography can provide definitive records against which publication assessments can be collated, and by which collaborations can be identified.
  8. An Open catalogue of Open scholarship. Since the bibliographic record for an article is Open, it can be annotated to show the Openness of the article itself, thus bibliographic data can be openly enhanced to show to what extent a paper is open and freely available.
  9. Cataloguing diverse materials related to bibliographic records. We see the opportunity to list databases, websites, review articles and other information which the community may find valuable, and to associate such lists with open bibliographic records.
  10. Use and development of machine learning methods for bibliographic data processing. Widespread availability of open bibliographic data in machine-readable formats
    should rapidly promote the use and development of machine-learning algorithms.
  11. Promotion of community information services. Widespread availability of open bibliographic web services will make it easier for those interested in promoting the development of scientific communities to develop and maintain subject-specific community information.

Sustaining Open Bibliography

Using these products

The products of this project add strength to an ecosystem of ongoing efforts towards large scale open bibliographic (and other) collections. We encourage others to use tools such as the OpenBiblio software, and to take our visualisations as examples for further application. We will maintain our exemplars for at least one year from publication of this post, whilst the software and content remain openly available to the community in perpetuity. We would be happy to hear from members of the community interested in using our products.

Further collaborations and future work

We intend to continue to build on the output of this project; after the success of liberating large bibliographic collections and clarifying open bibliographic principles, the focus is now on managing personal / small collections. Collaborative efforts with the Bibliographic Knowledge network project have begun, and continuing development will make the aforementioned releases of large scale open bibliographic datasets directly relevant and beneficial to people in the academic community, by providing a way for individuals – or departments or research groups – to easily manage, present, and search their own bibliographic collections.

Via collaboration with the Scholarly HTML community we intend to follow conventions for embedding bibliographic metadata within HTML documents whilst also enabling collection of such embedded records into BibJSON, thus allowing embedded metadata whilst also providing additional functionality similar to that demonstrated already, such as search and visualisation. We are also working towards ensuring compatibility between ScHTML and Schema.org, affording greater relevance and usability of ScHTML data.

Success in these ongoing efforts will enable us to support large scale open bibliographic data, providing a strong basis for open scholarship in the future. We hope to attract further support and collaboration from groups that realise the importance of Open Source code, Open Data and Open Knowledge to the future of scholarship.

Project partners

..

Open Bibliographic Data Challenge

Mark MacGillivray - February 10, 2011 in Bibliographic, Bibliographica, Campaigning, News, Open Data

What can you do with open access to data? What great ideas do you have for utilising open access to bibliographic catalogues? Or what example prototypes can you come up with? We want to find out!

  • 2 x £50 prizes for great ideas using bibliographic data
  • 2 x £500 prizes for building prototype apps using open bibliographic data

The Open Bibliographic Data challenge is currently up and running, with further details available at http://openbiblio.net/challenge

Sometimes data is not always as open as we would like, restricting our ability to share and collaborate; but one good way to increase the opportunities to work together is to demonstrate just how much we can do with data that is openly available to us, providing proofs of concept that inspire others.

The JISC openbib project recently announced releases from Cambridge University Library, IUCr and the British Library of large bibliographic datasets.

In addition, the BL dataset comprising the British National Bibliography has been uploaded into a triple store with search at http://bnb.bibliographica.org.

We would like to get more people involved so that together we can develop new and better ways to use this data.

Milestone for Open Bibliographic Data: British Library Release 3 Million Records

Mark MacGillivray - November 23, 2010 in Bibliographica, News, WG Open Bibliographic Data

The JISC funded OpenBib project, of which OKF is a partner, announced last week in collaboration with the British Library the release of 3 million open bibliographic records to the community.

This release represents a milestone for open bibliography as it represents the first substantial corpus of bibliographic data to be released in an open form by a national library.

As reported in the announcement post:

We have initially received a dataset consisting of approximately 3 million records, which is now available as a CKAN package. This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases. In addition, we are developing sample access methods onto the data, which we will post about later this week.

The data has also been loaded into Bibliographica so that it can be searched. For those who like RDF there is a sparql endpoint and there is also an isbn lookup service. More from the announce post:

The data has been loaded into a Virtuoso store that is queriable through the SPARQL Endpoint and the URIs that we have assigned each record use the ORDF software to make them dereferencable, supporting perform content auto-negotiation as well as embedding RDFa in the HTML representation.

The data contains some 3 million individual records and some 173 million triples. Indexing the data was a very CPU intensive process taking approximately three days. Transforming and loading the source data took about five hours.

To get an idea of the shape of the data, let us consider a sample resource, http://bnb.bibliographica.org/entry/GB8102507

Workshop on Open Bibliographic Data and the Public Domain

Jonathan Gray - August 17, 2010 in Bibliographica, Events, OKF Projects, Open Data, Public Domain, Public Domain Works, WG Open Bibliographic Data, WG Public Domain, Working Groups

We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows:

Here’s the blurb:

This one day workshop will focus on open bibliographic data and the public domain. In particular it will address questions like:

  • What is the role of freely reusable metadata about works in calculating which works are in the public domains in different jurisdictions?
  • How can we use existing sources of open data to automate the calculation of which works are in the public domain?
  • What data sharing policies in libraries and cultural heritage institutions would support automated calculation of copyright status?
  • How can we connect databases of information about public domain works with digital copies of public domain works from different sources (Wikipedia, Europeana, Project Gutenberg, …)?
  • How can we map existing sources of public domain works in different countries/languages more effectively?

The day will be very much focused on productive discussion and ‘getting things done’ — rather than presentations. Sessions will include policy discussions about public domain calculation under the auspices of Communia (a European thematic network on the digital public domain), as well as hands on coding sessions run by the Open Knowledge Foundation. The workshop is a satellite event to the 3rd Free Culture Research
Conference
on 8-9th October.

If you would like to participate, you can register at:

If you have ideas for things you’d like to discuss, please add them at:

To take part in discussion on these topics before and after this event, please join:

ORDF – the OKFN RDF Library

Rufus Pollock - July 2, 2010 in Bibliographica, Technical

Some months ago we started looking at how we might possibly use an RDF store instead of a SQL database behind data-driven websites — of which OKF has several. The reasons have to do with making the data reuseable in a better way than ad-hoc JSON APIs.

As we tend to program in Python and use the Pylons framework_static/write-ops.png this led us to consider some alternatives like RDFAlchemy and SuRF. Both of those build on top of RDFLib and try to present a programming interface reminiscent of SQL-ORM middleware like SQLObject and SQLAlchemy. They assume a single database-like storage for the RDF data and in some cases make some assumptions about the form of the data itself.

One important thing that they do not directly handle is customised indexing — and triplestores vary widely in terms of how well certain types of queries will perform, if they are supported at all. Overall, using RDFAlchemy or SuRF didn’t seem like much of a gain over using RDFLib directly. So we started writing our own middleware which we’ve named ORDF (OKFN RDF Library).

Code and documentation is at http://ordf.org/

ORDF Features and Structure

Key features of ORDF:

  • Open-source and python-based (builds on RDFLib)
  • Clean separation of functionality such as storage, indexing, web frontend
  • Easy pluggability of different storage and indexing engines (all those supported by RDFLib, 4store, simple-disk using pairtree etc)
  • Extensibility via messaging (we use rabbitmq)
  • Built-in rdf “revisioning”: every set of changes to the RDF store is kept in a “changeset”. This enables provenance, roll-back, change reporting “out-of-the-box”

To illustrate how this works, here’s a diagram showing a write operation in ORDF using most of the features described above. Below we go into detail describing how it all works.

Write operations in ORDF diagram

Forward Compatibility with RDFLib

The ORDF middleware solves several problems. The first, and most mundane, is to paper over the significant API changes between versions 2.4.2 and 3.0.0 of RDFLib. The RDFLib moved things around a bunch and this tends to break things because statements like from rdflib import Graph need to be changed to from rdflib.graph import Graph. So the first thing ORDF does is let you do from ordf.graph import Graph which will work no matter which version of RDFLib you have installed. This is important because the changes in 3.0.0 are deeper than just some renaming of modules, there is software, the FuXi reasoner and anything that uses the SPARQL query language, that will not work well with the new version. This means that we basically have a forward compatibility layer that means that software developed with ORDF should continue to work once newer RDFLib stabilises.

Pylons Support

Only slightly less mundane than the previous, ORDF includes some code that should be common amongst web applications using the Pylons framework for accessing the ORDF facilities. This means controllers for obtaining copies of graphs in various serialisations and for implementing a SPARQL endpoint.

Indices and Message Queues

Then we have indexes and queueing. Named graphs, the moral equivalent of the objects from the SQL-ORM world are stored in more than one place to facilitate different kinds of queries,

  • The pairtree filesystem index, which is good for retrieving a graph if you know its name and simply stores it as a file in a specialised directory hierarchy on the disk. This is not good for querying but is pretty much future-proof — at least as long as it is possible to read a file from the disk.
  • An rdflib supported storage, suitable for small to medium sized datasets, does not depend on any external software and allows SPARQL queries over the data for graph traversal operations
  • The 4store quad-store which fulfills a similar role for larger datasets, allowing SPARQL queries but requires an additional piece of software running (possibly on a cluster for very large datasets) and is somewhat harder to set up.
  • A xapian full-text search index, allows free-form queries over text strings, something that no triplestore does very well.

There are plans for further storage back-ends, specifically using Solr as well as other triplestores such as Jena and Virtuoso.

A key element of this indexing architecture is that it is distributed. Whilst you can configure all of these index types into a single running program — and it is common to do so for development — in reality some indexing operations are expensive and you don’t necessarily want the client program sitting and waiting while they are done synchronously. So there is also a pseudo-index that sends graphs to a rabbitmq messaging server and for each index a daemon is run that listens to a queue on a fan-out exchange.

Introducing a layer of message queueing also makes it possible to support inferencing or the process of deriving new information or statements from the given data. This is an operation that is considerably more computationally expensive than mere indexing. It is accomplished by using two queues. When a graph is save, it first gets put on a queue conventionally called reason. The FuXi reasoner listens to that queue, computes some new statements (known in the literature as a production rule or forward-chaining system), and then puts the resulting, augmented, graph onto a queue called index and thence to the indexers.

Ontology Logic

Until most recently there was only one ontology-specific behaviour coded into ORDF and that was the ChangeSet. It is still important. It provides low level, per-statement, provenance and change history information. This is built into the system. A save operation on a graph is accomplished by obtaining a change context and adding one or more graphs to it, then committing the changes. Before sending the graphs out for indexing or reasoning or queueing or whatnot, a copy of the previous version of the graphs is obtained (usually from pairtree storage) and the differences are calculated. These differences along with some metadata make up the ChangeSet which is saved and indexed along with the graphs themselves. This accomplishes what we call Syntactic Provenance because it operates at the level of individual statements.

Lately several more modules have been added to support other vocabularies. The work on the Bibliographica project led to the introduction of the OPMV vocabulary for Semantic Provenance. This is used to describe the way a data record from an external source (in this case MARC data) is transformed by a Process into a collection of statements or graph, and the way other graphs are derived from this first one. This is a distinct problem from Syntactic Provenance since it deals with the relationships between entities or objects and not simply add/remove operations on their attributes.

Another addition has been the ORE Aggregation vocabulary which is also used in Bibliographica. In our system since distinct entities or objects are stored as named graphs, we want to avoid having data duplicated in places where it should not be. For example, a book might have an author and users are ultimately interested in seeing the author’s name when they are looking at data about the book. But we do not want to store the author’s details in the book’s graph because that means that if someone notices and corrects an error the error must be corrected both in the author’s graph and their book’s. Better to keep such changes in one place. So what we actually do is create an aggregation. The aggregation contains (points at, aggregates) the book and author graph and also includes a pointer to some information on how to display it.

More to come on concrete implementation of ontology-specific behaviour, MARC processing and Aggregations in a following-up post on Bibliographica.

Next Steps

There is much more ontology-specific work to be done. First on the list is an implementation in Python of the Fresnel vocabulary that is used to describe how to display RDF data in HTML. It is more a set of instructions than a templating language and we have already written an implementation in JavaScript. It is crucial, however, that websites built with ORDF do not rely on JavaScript for presentation and we should rely on custom templates as little as possible.

ORDF is now stable enough to start using in other projects, at least within the OKF family. A first and fairly easy case will be updating the RDF interface to CKAN to use it — fitting as ORDF actually started out as a refactor of that very codebase.

Bibliographica, an Introduction

Rufus Pollock - May 20, 2010 in Bibliographica, News, OKF Projects

It’s time to talk a bit about Bibliographica, a new project of the Open Knowledge Foundation.

Bibliographica is designed to make it easier for scholars and researchers to share and collect information about work in their field. It provides an open source software platform to create and share semantically rich information about publications, authors and their works.

As readers of the Open Knowledge Foundation blog will know we have a long-standing interest in open bibliographic data – from our efforts starting in 2005 to build a database of public domain works, our coordination of the response to the Library of Congress’ Future of Bibliographic Control (2007) and the recent creation of a new working group on open bibliographic data in March this year.

Bibliographica itself, is a long-held dream of Jonathan Gray, OKF’s Community Coordinator – a commons of open data surrounding scholarly communications. Thanks to collaboration and support from IDEA Lab at the University of Edinburgh, the dream is a bit closer to reality.

The primary “technical” features of Bibliographica are:

  • Rich (FRBR-based) domain model
  • Semantic web and linked open data to the core providing for very flexible metadata and easy integration of external material
  • Wiki-like revisioning of all changes enabling easier and freer collaboration
  • Software and a Service
  • Designed to be installed and run by others
  • Distributed — can run different nodes with pull (and push) of data between them

But what needs of users does Bibliographica aim to satisfy?

Easy collaboration by scholars and librariains in creating bibliographies and enhancing catalogues

Often the people who know most about what is published in a given field are the researchers who are active in that field. Bibliographica will enable scholars to directly collaborate on annotated bibliographic indexes for their subject area. A revisioned (wiki-like) approach to adding metadata allows for more open collaboration, and a semantic web base means support for rich metadata with a good standard structure.

We think that letting researchers directly add or edit details about publications in their field — which they can then export, publish, or do whatever they like with — is a good way to keep this information accurate and up to date.

Easy creation of publication lists for different uses

Bibliographica will provide either directly, or via integration with existing tools, an easy way to create and annotate lists of publications. Create a reading list for an undergraduate course, a bibliography for a book or article that you are writing, or a detailed list of works about a given person.

Open software and service so anyone can run their own copy

Bibliographica will be a fully open service. All the code will be open source and by default all the data will be openly licensed. Just as projects like WordPress allow anyone to set up their own copy (rather than depending on a centralised and possibly proprietary third-party service), so institutions and groups of researchers will be able to set up and run their own instance of Bibliographica, which they can customise and extend.

More sophisticated data models and searches

We plan to harness the specialised knowledge of researchers in particular domains to richly annotate information in the database so that one can provide (good) answers to questions like: “What was published on Nietschze in English between 1950 and 1975?”

Linked Data vocabularies allow wide range of statements to be made about a work or author. We’re starting with the Dublin Core and SKOS vocabularies, and defining some of our own for expressing the types of things that can be said about works or authors.

Once a substantial amount of such information has been collected it will become possible to use inferencing techniques to provide answers to more subtle and interesting questions than would be answerable by the usual bibliographic metadata alone.

Get involved

We’ll be writing more in the coming weeks about the roadmap for the Bibliographica service and some of the specifics around the use of Linked Data to describe scholarly communications.

It would be great to hear from those of you who’d like to get involved – helping to refine the data models, suggest vocabularies we should be re-using, contributing research resources to the version at bibliographica.org or testing out your own instance of the Bibliographica software.

Please get in touch, or join us on the Open Knowledge Foundation’s open-bibliography mailing list.

Resources

Get Updates