Melanie Dulong de Rosnay recently published an excellent paper on open data in the life sciences in Nature Precedings entitled Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness. From the abstract:

Molecular biology data are subject to terms of use that vary widely between databases and curating institutions. This research presents a taxonomy of contractual and technical restrictions applicable to databases in life science. It builds upon research led by Science Commons demonstrating why open data and the freedom to integrate facilitate innovation and how this openness can be achieved. The taxonomy describes technical and legal restrictions applicable to life science databases, and its metadata have been used to assess terms of use of databases hosted by Life Science Resource Name (LSRN) Schema. While a few public domain policies are standardized, most terms of use are not harmonized, difficult to understand and impose controls that prevent others from effectively reusing data. Identifying a small number of restrictions allows one to quickly appreciate which databases are open. A checklist for data openness is proposed in order to assist database curators who wish to make their data more open to make sure they do so.

Shirley Fung has published a directory of open datasets examined in the paper, and details of their re-usability on Molecular Biology Databases.

For each dataset, they provided basic metadata, including:

  • The name and URL of the database,
  • URL of the download page and URL of the terms of use,
  • Extracts of the terms of use for further review and comments,
  • Values for technical accessibility and legal accessibility features [...]

They then looked at various technical and legal restrictions for accessing, acquiring and re-using the material - including bulk downloadability, registration, password protection, terms and conditions, and licensing - asking the following questions:

  • Is there a link to download the whole database?
  • Is it possible to access the data through a batch feature?
  • Is it possible to access the data through a query-based system?
  • Finally, is registration compulsory before downloading or accessing data in the ways described above?
  • Does the database have a policy?
  • Are there any restrictions on the right to reformatting and redistributing?
  • Which restrictions?

This is very similar to the work we have been doing with ckan.net, which aims to provide basic metadata for knowledge packages, including:

  • url
  • title
  • download url
  • tags
  • license/legal status
  • unstructured text field with a description of the resource and details about its openness

Furthermore, CKAN uses certain tags to indicate any technical or legal restrictions on the packages that are listed. For technical access, this includes bulk downloads, registrations, password protection, and access through an API:

For legal terms tags include noncommercial restrictions, and cases where terms of re-use are not clear:

There are also several ‘todo’ tags to indicate where it might be useful to write to the knowledge publisher or distributor to clarify something, to split up the entry into multiple entries, or to otherwise work on the registry:

There is significant work involved in documenting the legal and technological issues involved in accessing and re-using knowledge. It would be fantastic if this could be made easier by sharing the results of this kind of research. CKAN is intended to be a community-driven resource to aid the discovery of (open) knowledge in the first instance, its automatic installation in the longer term, and ultimately to support its re-use by providing multiple download links, multiple formats, big datasets broken down into smaller components and so on.

The MBDB is a fantastic project and we hope that in future we can put our heads together with Melanie, Shirley and others to improve the discoverability (and re-usability) of open data in the life sciences!

Related posts:

  1. Over 200 Packages on CKAN! Today the number of packages in the Comp
  2. Workshop on Finding and Re-using Open Scientific Resources, 8th November 2008 We’re pleased to announce another
  3. Workshop on Finding and Re-using Open Scientific Resources, Saturday 8th November As we announced earlier this month, tomo
  4. Workshop on Finding and Re-using Public Information, 1st November 2008 We are pleased to announce a workshop on
  5. British Academy Report: Copyright and research in the humanities and social sciences The British Academy has just published i

Related posts brought to you by Yet Another Related Posts Plugin.

2 Responses to “CKAN and Finding Open Data in the Life Sciences”

  1. Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, 8th November 2008 Says:

    [...] We’re pleased to announce another OKF workshop in London this November - on ‘Finding and Re-using Open Scientific Resources’. As a concrete outcome of the workshop, we hope to add more open scientific resources to CKAN, which is something we’ve blogged about in the past. [...]

  2. Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, Saturday 8th November Says:

    [...] As we announced earlier this month, tomorrow is our Workshop on Finding and Re-using Open Scientific Resources. As a concrete outcome of the workshop, we hope to add more open scientific resources to CKAN, as we did last Saturday in the Workshop on Public Information and which we’ve blogged about in the past. [...]

Leave a Reply

Subscribe without commenting