E

Open the menu
  • Twitter/X
  • Forum
  • LinkedIn
  • Mastodon
  • ↪ OKFN website
  • All Posts
    • Submit a guest post

CKAN and Finding Open Data in the Life Sciences

  1. Home
  2. Metadata
  3. CKAN and Finding Open Data in the Life Sciences

July 29, 2008, by Jonathan Gray

Melanie Dulong de Rosnay recently published an excellent paper on open data in the life sciences in Nature Precedings entitled Check Your Data Freedom: A Taxonomy to Assess Life Science Database Openness. From the abstract:

Molecular biology data are subject to terms of use that vary widely between databases and curating institutions. This research presents a taxonomy of contractual and technical restrictions applicable to databases in life science. It builds upon research led by Science Commons demonstrating why open data and the freedom to integrate facilitate innovation and how this openness can be achieved. The taxonomy describes technical and legal restrictions applicable to life science databases, and its metadata have been used to assess terms of use of databases hosted by Life Science Resource Name (LSRN) Schema. While a few public domain policies are standardized, most terms of use are not harmonized, difficult to understand and impose controls that prevent others from effectively reusing data. Identifying a small number of restrictions allows one to quickly appreciate which databases are open. A checklist for data openness is proposed in order to assist database curators who wish to make their data more open to make sure they do so.

Shirley Fung has published a directory of open datasets examined in the paper, and details of their re-usability on Molecular Biology Databases.

For each dataset, they provided basic metadata, including:

  • The name and URL of the database,
  • URL of the download page and URL of the terms of use,
  • Extracts of the terms of use for further review and comments,
  • Values for technical accessibility and legal accessibility features […]

They then looked at various technical and legal restrictions for accessing, acquiring and re-using the material – including bulk downloadability, registration, password protection, terms and conditions, and licensing – asking the following questions:

  • Is there a link to download the whole database?
  • Is it possible to access the data through a batch feature?
  • Is it possible to access the data through a query-based system?
  • Finally, is registration compulsory before downloading or accessing data in the ways
    described above?
  • Does the database have a policy?
  • Are there any restrictions on the right to reformatting and redistributing?
  • Which restrictions?

This is very similar to the work we have been doing with ckan.net, which aims to provide basic metadata for knowledge packages, including:

  • url
  • title
  • download url
  • tags
  • license/legal status
  • unstructured text field with a description of the resource and details about its openness

Furthermore, CKAN uses certain tags to indicate any technical or legal restrictions on the packages that are listed. For technical access, this includes bulk downloads, registrations, password protection, and access through an API:

  • http://ckan.net/tag/read/access-nobulk
  • http://ckan.net/tag/read/access-bulk
  • http://ckan.net/tag/read/access-registration
  • http://ckan.net/tag/read/access-api
  • http://ckan.net/tag/read/access-password
  • http://ckan.net/tag/read/access-www

For legal terms tags include noncommercial restrictions, and cases where terms of re-use are not clear:

  • http://ckan.net/tag/read/license-issues
  • http://ckan.net/tag/read/license-nc
  • http://ckan.net/tag/read/license-noncommercial
  • http://ckan.net/tag/read/license-not-specified
  • http://ckan.net/tag/read/license-todo
  • http://ckan.net/tag/read/license-unknown

There are also several ‘todo’ tags to indicate where it might be useful to write to the knowledge publisher or distributor to clarify something, to split up the entry into multiple entries, or to otherwise work on the registry:

  • http://ckan.net/tag/read/todo-breakdown
  • http://ckan.net/tag/read/todo-contact
  • http://ckan.net/tag/read/todo-list-datasets
  • http://ckan.net/tag/read/todo-split
  • http://ckan.net/tag/read/todo-split-up
  • http://ckan.net/tag/read/todo-splitup

There is significant work involved in documenting the legal and technological issues involved in accessing and re-using knowledge. It would be fantastic if this could be made easier by sharing the results of this kind of research. CKAN is intended to be a community-driven resource to aid the discovery of (open) knowledge in the first instance, its automatic installation in the longer term, and ultimately to support its re-use by providing multiple download links, multiple formats, big datasets broken down into smaller components and so on.

The MBDB is a fantastic project and we hope that in future we can put our heads together with Melanie, Shirley and others to improve the discoverability (and re-usability) of open data in the life sciences!

Jonathan Gray
Website | + posts

Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.

  • Jonathan Gray
    https://blog.okfn.org/author/jwyg/
    Publication: A Field Guide to “Fake News” and Other Information Disorders
  • Jonathan Gray
    https://blog.okfn.org/author/jwyg/
    New edition of Data Journalism Handbook to explore journalistic interventions in the data society
  • Jonathan Gray
    https://blog.okfn.org/author/jwyg/
    How could a global public database help to tackle corporate tax avoidance?
  • Jonathan Gray
    https://blog.okfn.org/author/jwyg/
    Who Will Shape the Future of the Data Society?
Posted in: Metadata, Open Data, Open/Closed

2 thoughts on “CKAN and Finding Open Data in the Life Sciences”

  1. Pingback: Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, 8th November 2008
  2. Pingback: Open Knowledge Foundation Weblog » Blog Archive » Workshop on Finding and Re-using Open Scientific Resources, Saturday 8th November

Comments are closed.

About Jonathan Gray

Dr. Jonathan Gray is Lecturer in Critical Infrastructure Studies at the Department of Digital Humanities, King’s College London, where he is currently writing a book on data worlds. He is also Cofounder of the Public Data Lab; and Research Associate at the Digital Methods Initiative (University of Amsterdam) and the médialab (Sciences Po, Paris). More about his work can be found at jonathangray.org and he tweets at @jwyg.

  • Search

  • Browse by Topic

    CKAN Community Data Journalism Events Frictionless Data Global Open Data Index Join us Metadata News ODD Stories OKFest Open/Closed Open Access Open Data Open Data Day Open Data Day 2020 Open Definition Open Geodata Open GLAM Open Government Data Open Knowledge Open Knowledge Foundation Open Knowledge Network Open Science Open Spending Open Standards Our Work Policy Projects Public Domain Public Domain Works Releases Research School of Data Talks Technical The Tech We Want WG Economics WG EU Open Data WG Open Data in Science WG Open Government Data WG Public Domain Where Does My Money Go Working Groups Workshop
  • Tools
  • Community
  • Creative Commons License

    This work is licensed under a Creative Commons Attribution 4.0 International License.

Don't miss a thing! Stay on top of what's happening in the #OpenMovement around the world.

Subscribe to our Newsletter
E
  • ↪ OKFN website
  • All Posts

Source code available under the MIT license.

cc by Content on this site, made by Open Knowledge Foundation, is licensed under a Creative Commons Attribution 4.0 International License .

  • ↪ OKFN website
  • All Posts
    • Submit a guest post
  • Twitter/X
  • Forum
  • LinkedIn
  • Mastodon