Open Knowledge Foundation

Open the menu
  • Facebook
  • Twitter
  • Open Knowledge homepage
  • Blog
    • Submit a guest post
  • All posts
  • Donate
  • About us

CKAN and Finding Open Data in the Life Sciences

  1. Home
  2. Metadata
  3. CKAN and Finding Open Data in the Life Sciences

July 29, 2008, by Jonathan Gray

Melanie Dulong de Rosnay recently published an excellent paper on open data in the life sciences in Nature Precedings entitled Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness. From the abstract:

Molecular biology data are subject to terms of use that vary widely between databases and curating institutions. This research presents a taxonomy of contractual and technical restrictions applicable to databases in life science. It builds upon research led by Science Commons demonstrating why open data and the freedom to integrate facilitate innovation and how this openness can be achieved. The taxonomy describes technical and legal restrictions applicable to life science databases, and its metadata have been used to assess terms of use of databases hosted by Life Science Resource Name (LSRN) Schema. While a few public domain policies are standardized, most terms of use are not harmonized, difficult to understand and impose controls that prevent others from effectively reusing data. Identifying a small number of restrictions allows one to quickly appreciate which databases are open. A checklist for data openness is proposed in order to assist database curators who wish to make their data more open to make sure they do so.

Shirley Fung has published a directory of open datasets examined in the paper, and details of their re-usability on Molecular Biology Databases.

For each dataset, they provided basic metadata, including:

  • The name and URL of the database,
  • URL of the download page and URL of the terms of use,
  • Extracts of the terms of use for further review and comments,
  • Values for technical accessibility and legal accessibility features […]

They then looked at various technical and legal restrictions for accessing, acquiring and re-using the material – including bulk downloadability, registration, password protection, terms and conditions, and licensing – asking the following questions:

  • Is there a link to download the whole database?
  • Is it possible to access the data through a batch feature?
  • Is it possible to access the data through a query-based system?
  • Finally, is registration compulsory before downloading or accessing data in the ways
    described above?
  • Does the database have a policy?
  • Are there any restrictions on the right to reformatting and redistributing?
  • Which restrictions?

This is very similar to the work we have been doing with ckan.net, which aims to provide basic metadata for knowledge packages, including:

  • url
  • title
  • download url
  • tags
  • license/legal status
  • unstructured text field with a description of the resource and details about its openness

Furthermore, CKAN uses certain tags to indicate any technical or legal restrictions on the packages that are listed. For technical access, this includes bulk downloads, registrations, password protection, and access through an API:

  • http://ckan.net/tag/read/access-nobulk
  • http://ckan.net/tag/read/access-bulk
  • http://ckan.net/tag/read/access-registration
  • http://ckan.net/tag/read/access-api
  • http://ckan.net/tag/read/access-password
  • http://ckan.net/tag/read/access-www

For legal terms tags include noncommercial restrictions, and cases where terms of re-use are not clear:

  • http://ckan.net/tag/read/license-issues
  • http://ckan.net/tag/read/license-nc
  • http://ckan.net/tag/read/license-noncommercial
  • http://ckan.net/tag/read/license-not-specified
  • http://ckan.net/tag/read/license-todo
  • http://ckan.net/tag/read/license-unknown

There are also several ‘todo’ tags to indicate where it might be useful to write to the knowledge publisher or distributor to clarify something, to split up the entry into multiple entries, or to otherwise work on the registry:

  • http://ckan.net/tag/read/todo-breakdown
  • http://ckan.net/tag/read/todo-contact
  • http://ckan.net/tag/read/todo-list-datasets
  • http://ckan.net/tag/read/todo-split
  • http://ckan.net/tag/read/todo-split-up
  • http://ckan.net/tag/read/todo-splitup

There is significant work involved in documenting the legal and technological issues involved in accessing and re-using knowledge. It would be fantastic if this could be made easier by sharing the results of this kind of research. CKAN is intended to be a community-driven resource to aid the discovery of (open) knowledge in the first instance, its automatic installation in the longer term, and ultimately to support its re-use by providing multiple download links, multiple formats, big datasets broken down into smaller components and so on.

The MBDB is a fantastic project and we hope that in future we can put our heads together with Melanie, Shirley and others to improve the discoverability (and re-usability) of open data in the life sciences!

Jonathan Gray
Website | + posts

Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.

    This author does not have any more posts.
Posted in: Metadata, Open Data, Open/Closed

2 thoughts on “CKAN and Finding Open Data in the Life Sciences”

  1. Pingback: Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, 8th November 2008
  2. Pingback: Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, Saturday 8th November

Comments are closed.

About Jonathan Gray

Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.

  • Creative Commons License

    This work is licensed under a Creative Commons Attribution 4.0 International License.

  • Tweets by OKFN
  • Topics of Interest

    argentina Brazil ckan community copyright Data Journalism EU events Fiscal transparency follow the money frictionless data germany Global Open Data Index GODI india internet Mexico MyData Nepal network nigeria OK Nepal Open Access Open Access Week open contracting open culture Open Data Open Data Day open data day 2018 open data day 2019 Open Data Day 2020 Open Data Day 2021 open data index open government open government data Open Knowledge Open Knowledge Foundation Open Knowledge Network open mapping open science open spending OpenSpending personal data policy research
  • Search

  • Archives

  • RSS OKF Network Planet

    • International Council meeting agenda- October 17th, 2013
    • International Council meeting agenda – Weds 7th August
    • International Council meeting – July 4 2013
    • International Council meeting – Monday April 29 2013
    • Coord Call 2012-10-08
    • Wolfpack Meeting Notes, 1/10/12
    • Wolfpack Meeting Notes, 10/9/2012
Open Knowledge Foundation
  • Open Knowledge homepage
  • Blog
  • All posts
  • Donate
  • About us

Source code available under the MIT license.

/assets/img/cc.svg /assets/img/by.svg Content on this site, made by Open Knowledge Foundation, is licensed under a Creative Commons Attribution 4.0 International License .

Refer to our attributions page for attributions of other work on the site.

  • Open Knowledge homepage
  • Blog
    • Submit a guest post
  • All posts
  • Donate
  • About us
  • Facebook
  • Twitter