The Power of Open Data
September 1st, 2010
The following guest post is from David Bollier, independent policy strategist, journalist, and author of Viral Spiral. It was originally posted at the On the Commons blog.
Science has always recognized the power of sharing in developing new knowledge. But in the search for treatments and cures for diseases like Alzheimer’s and Parkinson’s, the sprawling bodies of highly diverse research data are not easily shared. Either they are considered proprietary resources for making money, or they are hidden in academic databases that others may not know about, often inaccessible because of incompatible software formats. No single researcher really has the resources or incentive to develop an overarching regime to enable cooperation and sharing. And so dozens of academics, nonprofits and pharmaceutical companies have continued their research in relative isolation.
“Companies were caught in a prisoner’s dilemma,” a research at the University of Pennsylvania recently told the New York Times. “They all wanted to move the field forward, but no one wanted to take the risks of doing it.”
But ten years ago, Dr. Neil S. Buckholtz, who oversees dementia research at the National Institutes of Health, realized that the sharing of research data was a collective action problem that might be solved through concerted leadership. He helped instigate a plan by which the NIH stepped up to serve as an “honest broker” between the pharmaceutical industry and academics. The goal was to ensure that all research would be shared openly and freely, and published on the Internet immediately, so that anyone could use it. You could publish a research paper and you could develop new treatments, but no one would own the data. Researchers would even be free to make mistakes or misguided interpretations — because who is to say at the outset that something is necessarily incorrect?
Seven years ago, the NIH persuaded scientists from the FDA, the drug industries, medical-imaging companies, academia and nonprofit groups to cooperate in an ambitious scheme to affirmatively share their findings with each other. As the Times reports (August 13, 2010), the sharing of data is now starting to show results. Scientists studying Alzheimer’s disease routinely share their findings about “biological markers” that indicate the progression of the disease. This has led to recent scientific papers suggesting the value of PET scans and tests of spinal fluids as ways to make early diagnoses of Alzheimer’s.
The conventional business response to such radical ideas of “sharing” is that no company would have adequate incentive to invest in risky research unless they could be assured exclusive ownership of the results, in order to create a revenue-generating “product” (i.e., medical treatment or drug). But the drug industry has had to concede that diseases such as Alzheimer’s are just too scientifically complicated for any single research entity to tackle; the most fruitful way forward is to pursue an “open source” approach that places the basic building-blocks of knowledge into the commons – while sanctioning the private patenting of more refined medical innovations that build on the fruits of the commons.
It’s so common-sensical that it seems faintly ridiculous that a story of this sort should merit lead-story treatment in the New York Times.
Open Context
July 13th, 2010
The following guest blog is from Open Context’s Project Lead Eric Kansa and Editor Sarah Whitcher Kansa, who are both members of the Open Knowledge Foundation’s Working Group on Open Data in Archaeology.
About Open Context
Open Context is a free, open access resource for the electronic publication of primary field research from archaeology and related disciplines. We developed it to help scholars and students to easily find and reuse field science data and media. The system makes data searchable and citable, with robust archival support from the California Digital Library. The Alexandria Archive Institute, an independent 501(c)(3) non-profit organization, maintains Open Context and provides editorial oversight for Open Context content. The project has been funded by the William and Flora Hewlett Foundation, the National Endowment for the Humanities (NEH), and the Institute of Museum and Library Services (IMLS)
Key Features of Open Context:
- Data publication (with peer review, if desired) of datasets, images, maps, and related items.
- Stable URL to every project and individual item within a project. Projects, items, and groups of items can be cited elsewhere and permanently linked to print publications.
- Citation provided for every project and item.
- Project and Person information to provide the user with more in-depth knowledge about the author and background of the study. *Faceted navigation that enables users to compose analytically precise searches and queries through a simple point and click interface.
- Web services, including Atom feeds, so that content can by syndicated and visualized elsewhere on the Web.
- Creative Commons licenses so that datasets are legally free for reuse.
Open Data Publication and Archiving with Open Context
Open Context emphasizes publication to work with familiar patterns in scholarly communication and encourage data dissemination in the research community. To this end, Open Context does not disseminate raw data but instead relies on editorial supervision to add description, documentation, and structure to researcher-contributed content. This transforms raw data into a more polished and intelligible product that is still as detailed and comprehensive as the original field documentation.
In development since late 2006, Open Context now hosts over 180,000 items, including nearly 5,000 media items, from 35 archaeological sites around the world. The current rate of publication is about one project per month, and we hope to increase that rate as our publication tools become more streamlined. While Open Context contains mainly archaeological content, it can also accommodate content from other field-based sciences (public health, conservation biology, geological sciences, etc.), so please feel free to get in touch if you have data you would like to publish.
To see some of this at work, check out the recently-published Aegean Prehistory Project, featuring data on shells recovered from three archaeological sites in the Aegean. Canan Çak?rlar published these data as an online appendix to the printed publication of her Ph.D. dissertation. In addition to an overview of her project and a link to where the printed publication can be purchased, Canan also has a “person page” with information about her work, publications, etc. Her data has been drawn (via an Atom feed) into BoneCommons, a Web resource for the worldwide zooarchaeology community. Thus, Canan’s work can be found via a search engine, Open Context, BoneCommons, or any other place that draws her content from Open Context. This makes for maximum exposure of her work to her colleagues, as well as its discovery by others for uses beyond archaeomalacology.
How Open is Open Context?
Open Context requires use of Creative Commons licenses (or the CC-Zero public domain dedication). Open Context also makes all data, including structured data (in XML, JSON, and CSV formats) freely available with no login barrier.
However, Open Context does permit use of license variants that restrict commercial use. While this restriction does inhibit interoperability, some stakeholder communities, especially indigenous groups, have deep historical and political concerns regarding commercial uses of cultural heritage materials. These concerns represent complex ethical challenges, but do highlight how the ideal of “openness” needs to be evaluated by other ethical considerations.
Incentives and Guidance for Openness
The National Science Foundation recently announced additional requirements for grant-seekers to develop meaningful “Data Access” plans for their proposals. Many researchers will have little background or understanding on how best to meet this requirement. To offer guidance for the researcher community, Open Context now offers guidelines for researchers to prepare data for online publication. In addition to these, we have developed an online estimation tool, which helps scholars budget appropriately for data sharing, guide them through licensing choices, and offer tips regarding good practices in data sharing. The estimation tool then returns texts to researchers that can be used in their NSF Data Access plans.

Climate Change, Climate Sceptics and Open Data
December 5th, 2009

With the United Nations Climate Change Conference in Copenhagen starting on Monday, it is of vital important that there is consensus on the scientific evidence about climate change, in order to inform debates about the best course of action for the international community. Sharing the same basic picture about the climate, global warming and the impact of human sources of carbon dioxide (regardless of the details of this picture, regardless of differences in opinion about the most appropriate course of action in reponse to it) is surely a critical prerequisite to effective and fruitful negotiations.
The recent illegally obtained emails from the University of East Anglia’s Climatic Research Unit (so-called ‘Climategate’) and the subsequent accusations of secrecy and malpractice from climate change sceptics have provoked debate in the media about the openness and availability of datasets related to climate change.
Partly in response to accusations of secrecy and falsification of key datasets from sceptics, the UK Met Office announced today they will be publishing new climate datasets. Earlier the Telegraph reported:
Sceptics alleged that emails stolen from the Climatic Research Unit at the university show scientists were willing to manipulate data to show global warming.
They also complain that the raw data for the climate models was not made available to the public.
To try to restore public confidence the Met Office is talking to other meteorological organisations around the world about recreating the model using the same raw data but more modern computers.
The whole process will also use any new information and be more open to the public.
This evening, the BBC reported:
Meanwhile, the Met Office said it would publish all the data from weather stations worldwide, which it said proved climate change was caused by humans.
Its database is a main source of analysis for the IPCC.
It has written to 188 countries for permission to publish the material, dating back 160 years from more than 1,000 weather stations.
As UEA said in an announcement from the end of November, over 95% of the CRU climate data is already available and permission to publish the remaining data will have to be sought from each of the relevant National Meteorological Services (NMSs) around the world on a case by case basis. Professor Davies of UEA, suggests there are partly commercial reasons for this:
We are grateful for the necessary support of the Met Office in requesting the permissions for releasing the information but understand that responses may take several months and that some countries may refuse permission due to the economic value of the data.
An editorial piece in Nature from a couple of days ago suggests:
Researchers are barred from publicly releasing meteorological data from many countries owing to contractual restrictions. Moreover, in countries such as Germany, France and the United Kingdom, the national meteorological services will provide data sets only when researchers specifically request them, and only after a significant delay. The lack of standard formats can also make it hard to compare and integrate data from different sources. Every aspect of this situation needs to change: if the current episode does not spur meteorological services to improve researchers’ ease of access, governments should force them to do so.
Mike Hulme of UEA and Jerome Ravetz of Oxford Univeristy argue in a recent BBC article that climate scientists will have to become better at engaging the public in their research:
While there will always be a unique function for expert scientific reviewers to play in authenticating knowledge, this need not exclude other interested and motivated citizens from being active.
These demands for more openness in science are intensified by the embedding of the internet and Web 2.0 media as central features of many people’s social exchanges.
In particular they suggest that scientists should respond to demands that:
- To be validated, knowledge must also be subject to the scrutiny of an extended community of citizens who have legitimate stakes in the significance of what is being claimed
- And to be empowered for use in public deliberation and policy-making, knowledge must be fully exposed to the proliferating new communication media by which such extended peer scrutiny takes place.
Roger Pielke, Professor of Environmental Studies at the University of Colorado, argues in a recent interview in the Washington Post that:
More openness, more transparency, more diversity, and more attention to the social construction of expertise is needed.
While it is important to remember, as Cameron Neylon notes, that proper interpretation of climate change data requires significant background knowledge and a thorough grounding in relevant scientific literature and tools, nevertheless it is clear that there is an increasing demand from interested non-expert non-scientists to access and reuse climate data. The Times recently published two pieces analysing and refuting a climate change sceptic’s interpretation of the publicly available HADCRU data. Another blogger points out that public environmental datasets allow non-expert members of the public to explore the evidence and draw different conclusions about climate change - and argues that the peer review process will act as a quality filter for their research.
In response to the demand for data, Real Climate (who were also hacked, and who provide two excellent posts on the CRU hack and background context) have published a very useful list of public climate datasets as well as a blog post asking the climate science community for further suggestions.
All of this interest in public sources of climate data, reminded us of our Open Environmental Data project which we started two years ago this autumn. The project aimed to answer the question:
- What environmental data is out there, and how open is it?
It also aimed to document relevant legislation and policy relevant to environmental data in different jurisdictions.
We have picked up this work again by starting a climate data group on CKAN, our open source registry of open data:
We have started to go through available public sources of climate data, looking at:
- Whether datasets are open as in the Open Knowledge Definition - i.e. whether they explicitly say that they can be used by anyone, for any purpose, without restriction (except perhaps attribution, integrity or sharealike requirements).
- Whether or not there are facilities to download raw data in bulk - i.e. whether they easily allow users to directly download all the data in open, machine readable formats.
Environmental data is an excellent case of where sharing is the key to scaling. Research institutions must share data with each other in order to build up as detailed a picture as possible, incorporating as much evidence as possible from around the world. As much of this research is publicly funded, and due to increasing public interest, there are now strong arguments for extending this sharing from sharing between research institutions to sharing to the public.
Furthermore, often access is not enough. Datasets need to be combined with other datasets, or reused in visual representations. Hence there are arguments for making data open as in the Open Knowledge Definition, which means that anyone can reuse and redistribute it for any purpose. This allow allows for innovation in the ways in which the data can be presented to the public by third parties, including not-for-profit organisations and companies - such as through the creation of new web services to allow the data to be explored.
There are currently 38 data sources listed, over half of which are fully open. However many datasets are still not explicitly legally open, and many of them have restrictions on how they can be reused. There are still plenty of datasets to add! We’ve been in touch with the folks at Real Climate, and they’ve been supportive of the project and encouraged us to reuse and build on their list of data sources.
In order to mark the occasion of the Copenhagen Conference, over the next few weeks we will be continuing to add publicly available climate data to CKAN. By better documenting existing open environmental data, we hope to make some small contribution to laying the groundwork for the shared picture about the state of our climate that we currently need.
If you are interested in contributing to the climate data group - please either drop us a line, or get stuck in and register a package!
Open Knowledge Conference (OKCon) 2010: Call for Proposals
November 10th, 2009
The Open Knowledge Conference (OKCon) 2010 Call for Proposals is now open!
We would be grateful for help in circulating the call to relevant lists and communities! You can reuse or point to:
- This blog post
- Main CFP page
- Plain text announce (wrapped at 72 characters)
- Identi.ca post
- Twitter post
Open Knowledge Conference (OKCon) 2010: Call for Proposals
- where: London, UK
- when: Saturday 24th April, 2010
- www: http://www.okfn.org/okcon/
- last year: http://www.okfn.org/okcon/2009/
- cfp: http://www.okfn.org/okcon/cfp/ (due: Jan 31st 2010)
- hashtag: #okcon
Introduction
OKCon, now in its fifth year, is the interdisciplinary conference that brings together individuals from across the open knowledge spectrum for a day of presentations and workshops.
Open knowledge promises significant social and economic benefits in a wide range of areas from governance to science, culture to technology. Opening up access to content and data can radically increase access and reuse, improving transparency, fostering innovation and increasing societal welfare.
This is a time of great change. In addition to high profile initiatives such as Wikipedia, OpenStreetMap and the Human Genome Project, there is enormous growth among open knowledge projects and communities at all levels. Moreover, in the last year, governments across the world have begun opening up huge amounts of their data.
And it doesn’t stop there. In academia, open access to both publications and data has been gathering momentum, and similar calls to open up learning materials have been heard in education. Furthermore this gathering flood of open data and content is the creator and driver of massive technological change. How can we make this data available, how can we connect it together, how can we use it collaborate and share our work?
Join us to discuss all of this and more!
Topics
We welcome proposals on any aspect of creating, publishing or reusing content or data that is open in accordance with opendefinition.org. Topics include but are not limited to:
Technology
- Semantic Web and Linked Data in relation to open knowledge
- Platforms, methods and tools for creating, sharing and curating open knowledge
- Light-weight, adaptive interaction models
- Open, decentralized social network applications
- Open geospatial data
Law, Society and Democracy
- Open Licensing, Legal Tools and the Public Domain
- Open government data and content (public sector information)
- Open knowledge and international development
- Opening up access to the law
Culture and Education
- Open educational tools and resources
- Business models for open content
- Incentive and rewards open-knowledge contributors
- Open textbooks
- Public domain digitisation initiatives
Science and Research
- Opening up scientific data
- Supporting scientific workflows with open knowledge models
- Open models for scientific innovation, funding and publication (’open-access’)
- Tools for analysing and visualizing open data
- Open knowledge in the humanities
Important Dates
- Submission deadline: January 31st 2010
- Notification of acceptance: March 1st
- Camera-ready papers due: March 31st
- OKCon: April 24th 2010
Submission Details
We are accepting three types of submissions:
- Full papers of 5-10 pages describing novel strategies, tools, services or best-practices related to open knowledge,
- Extended talk abstracts of 2-4 pages focusing on novel ideas, ongoing work and upcoming research challenges.
- Proposals for short talks and demonstrations
OKCon will implement an open submission and reviewing process. To make a submission visit:
Depending on the assessment of the submissions by the programme committee and external reviewers, submissions will be accepted either as full, short or lightning/poster presentations.
Proceedings of OKCON will be published at CEUR-WS.org. If you want your submission to be included in the conference proceedings you have to prepare a manuscript of your submission according to the LNCS Style.
Programme Committee
- Sören Auer, AKSW/Universität Leipzig
- Christopher Corbin, UK Advisory Board on Public Sector Information (APPSI)
- Adnan Hadzi and Andrea Rota, Department of Media and Communications, Goldsmiths College, University of London
- Claudia Müller-Birn, Carnegie Mellon University
- Peter Murray-Rust, University of Cambridge
- Rufus Pollock, Open Knowledge Foundation and Emmanuel College, University of Cambridge
- John Wilbanks, Science Commons
New mailing list for open knowledge in development
June 1st, 2009
We’ve just launched a new mailing list for those interested in open knowledge in development:
As you may have seen we had a session on Open Knowledge for Development at OKCon 2009. There was also discussion of the value of sharing knowledge for development at the 5th Communia Workshop - particularly from Pierre Guillaume Wielezynski of the World Food Programme and Richard Owens of WIPO.
We encourage you to join - whether you’re interested in:
- visually representing development related open data (a la OKF Advisory Board member Hans Rosling)
- sharing development information or making it easier to find and re-use (a la Aidinfo or PublishWhatYouFund)
- sharing practical information for development, e.g. on sanitation or construction (a la Appropedia or Akvo)
- open textbooks and open resources for education in developing countries
- or in any other open knowledge thats related to development!
Also please consider passing this on to relevant colleagues:
- Short URL for this post: http://ur1.ca/50fr
- Identi.ca post: http://identi.ca/notice/4786130
- Twitter post: http://twitter.com/okfn/status/1991186271
Open Knowledge Conference (OKCon) 2009: Saturday 28th March
March 22nd, 2009
Open Knowledge Conference (OKCon) 2009 will take place next Saturday 28th March - less than a week away!
- where: Centre for Advanced Spatial Analysis, UCL, London
- when: 28th March 2009, 1030-1830
- home: http://www.okfn.org/okcon/
- register: http://www.okfn.org/okcon/register/
If you plan to attend, and haven’t registered yet - we encourage you to book your ticket now as space is limited!
OKCon 2009: speakers and sessions
There will be two main sessions on ‘open knowledge for development’ and ‘open data and the semantic web’. In addition there will be plenty of open space - which currently includes talks ranging from visualising historical data to public domain fashion. Speakers will include:
- Rufus Pollock, Open Knowledge Foundation
- Jordan Hatcher, Open Data Commons
- Leigh Dodds, Talis
- Jeni Tennison, London Gazette + RDFa
- Tom Scott, BBC
- Mark Charmer, AKVO
- Simon Parrish, Aidinfo
- Vinay Gupta, Appropedia
- David Bollier, OnTheCommons + Author of Viral Spiral
- Sebastian Hellmann , DBpedia
- Hugh Glaser, University of Southampton
- Adnan Hadzi, Goldsmiths + Deptford.tv
- John Dalziel, The Computus Engine
- Julian Todd, Public Whip
- Andrea Rota, Liquid Culture + Yuk Hui, Goldsmiths
- Harry Halpin, University of Edinburgh + W3C
- Humphrey Southall, A Vision of Britain
Full details are available at the programme page and the provisional open space schedule.
Working Group on Open Data in Science
March 13th, 2009
We are pleased to announce the launch of a new Working Group on Open Data in Science. In the first instance, the group will aim to:
- Act as a central point of reference and support for people who think they are interested in open data in science.
- Identify practices of early adopters, collecting data and developing guides.
- Act as a hub for the development of low cost, community driven projects around open data in science.
We are currently working on:
- a prize for open data in science
- a service to request that a given dataset to be made open or to request clarification about whether or not it can be re-used
- case studies on the benefits of open data in different domains
The Working Group has the following founding members:
- Jonathan Gray, Open Knowledge Foundation
- Andrew Gruen, University of Cambridge
- Tim Hubbard, Wellcome Trust Sanger Institute
- Jenny Molloy, University of Cambridge
- Peter Murray-Rust, University of Cambridge
- Cameron Neylon, Science and Technology Facilities Council
- Michael Nielsen
- Rufus Pollock, Open Knowledge Foundation
- John Wilbanks, Science Commons
If you’re interested in participating in the work of the open group, please get in touch on the main open-science mailing list!
Open Everything Berlin + CC Salon Berlin
February 27th, 2009
After the success of open everything Berlin last December (see documentation), the newthinking network and CC Salon Berlin teamed up to put on another event in Berlin last night:
- CC Salon Berlin and openeverything focus - Feb. 26 (CC Blog)
- openeverything focus + CC Salon (Michelle Thorne’s blogpost)
I was invited to speak - and gave an overview of the Open Knowledge Foundation, our projects, events, the background and rationale behind the Open Knowledge Definition, and a quick walkthrough of CKAN.
After me was Sebastian Moleski from Wikimedia Deutschland talking about the large donation of images from the German Federal Archives to Wikimedia Commons:
Starting on Thursday Dec 4, 2008, Wikimedia Commons witnessed a massive upload of new images. We received nearly 100,000 files from a donation from the German Federal Archives. These images are mostly related to the history of Germany (including the German Democratic Republic) and are part of a cooperation between Wikimedia Germany and the Federal Archives.
These images are licensed Creative Commons Attribution ShareAlike 3.0 Germany License (CC-BY-SA). Wikimedia Germany and the Federal Archives have signed a cooperation agreement that, among other things, asserts that the Federal Archives owns sufficient rights to be able to grant this kind of license.
The donation received good press coverage (see articles in the New York Times, and Spiegel Online) and is an outstanding example of a cultural heritage institution making material available under an open license. (The other high-profile example is Flickr Commons. There’s an interesting blog post comparing the two here.)
To demonstrate CKAN in action, I created a commons-bundesarchiv entry for the collection.
Another interesting project I learned about was Valkaama - a movie where all the source material is openly licensed. (The creators have also been working on an Open Source Film Definition.)
If you’re in or around Berlin and interested in participating in similar events in the future, there’s a list of future events on the the Open Everything Berlin Mixxt Network. Also, if you want to stay in touch with people interested in all things open, see:
- open knowledge Berlin local group
- ok-berlin mailing list
Public Interest Information Policy in Germany
February 17th, 2009
I was recently asked to write a piece for Berlin-based think tank Das Progressive Zentrum on public interest information policy in Germany:
- Wem gehört das Wissen? Informationspolitik in Deutschland (Shorter German version)
- Public Interest Information Policy in Germany (Longer English version)
The piece finishes with three policy suggestions:
- Support legislation as well as licensing and pricing policies that support public re-usability of Public Sector Information. The creation of a national register of PSI assets, and the commissioning of a country-wide and cross-sector report would help to inform appropriate activity in this area.
- Support mandates for open access to publicly funded research. These should target higher education institutions, as well as funding bodies and umbrella organisations.
- Keep the public domain in the public domain. Encourage publicly funded cultural heritage institutions to allow digital copies of their holdings to be re-used by the public. Encourage the adoption of intellectual property law and policy that takes account of public interest, as well as private interests.
What Obama can do to promote openness
January 20th, 2009
With the inauguration of US President-Elect Barack Obama later today - we thought we’d prepare a brief list of things he can do to promote openness in his new role.
- Open government data. Make core government data open (as in opendefinition.org) - so that it can be re-used in mashups, visually represented, used in semantic web applications and so on! This idea is currently in 5th place on the Obama CTO site with over 5,800 votes.
- Open access to publicly funded research. As suggested by Open Knowledge Foundation Advisory Board member, Peter Suber: “Require open access to the results of non-classified research funded by taxpayers. Extend the exemplary policy now in place at the NIH to all federal agencies.”. Currently in 12th place on ObamaCTO with over 1,600 votes.
- Publish public information in way which makes it easy to re-use. For example, publish in XML or Text/CSV, not PDF files which data must be extracted from. Allow direct, bulk downloading, rather than access through an API or piecemeal access via a web service. (For more on this see our post Give Us the Data Raw, and Give it to Us Now.)The Data Catalogue of Vivek Kundra’s Office in the District of Columbia is a great example of this.
- Legal and licensing clarity. Be clear about what can and can’t be done with public content and data - with explicit legal and licensing statements, terms of use, and so on. Be clear what is in the public domain and what is free for re-use as long as attribution is given. Be clear about what is not available for use - including material where copyright is held by third parties. Fine grained permissions - with clear terms for each document and dataset - are better than blanket statements, which require each case to be investigated individually!
- Make it open by default. Make public content and data - whether its government data, or publicly funded digitisation of cultural heritage artefacts - open by default. Though this is not appropriate for everything, consider allowing as much as possible to be re-used. Think of the ‘Principle of Many Minds’ - there are lots of interesting things that can be done with a given document or dataset that you may not have thought of!

