The Power of Open Data
September 1st, 2010
The following guest post is from David Bollier, independent policy strategist, journalist, and author of Viral Spiral. It was originally posted at the On the Commons blog.
Science has always recognized the power of sharing in developing new knowledge. But in the search for treatments and cures for diseases like Alzheimer’s and Parkinson’s, the sprawling bodies of highly diverse research data are not easily shared. Either they are considered proprietary resources for making money, or they are hidden in academic databases that others may not know about, often inaccessible because of incompatible software formats. No single researcher really has the resources or incentive to develop an overarching regime to enable cooperation and sharing. And so dozens of academics, nonprofits and pharmaceutical companies have continued their research in relative isolation.
“Companies were caught in a prisoner’s dilemma,” a research at the University of Pennsylvania recently told the New York Times. “They all wanted to move the field forward, but no one wanted to take the risks of doing it.”
But ten years ago, Dr. Neil S. Buckholtz, who oversees dementia research at the National Institutes of Health, realized that the sharing of research data was a collective action problem that might be solved through concerted leadership. He helped instigate a plan by which the NIH stepped up to serve as an “honest broker” between the pharmaceutical industry and academics. The goal was to ensure that all research would be shared openly and freely, and published on the Internet immediately, so that anyone could use it. You could publish a research paper and you could develop new treatments, but no one would own the data. Researchers would even be free to make mistakes or misguided interpretations — because who is to say at the outset that something is necessarily incorrect?
Seven years ago, the NIH persuaded scientists from the FDA, the drug industries, medical-imaging companies, academia and nonprofit groups to cooperate in an ambitious scheme to affirmatively share their findings with each other. As the Times reports (August 13, 2010), the sharing of data is now starting to show results. Scientists studying Alzheimer’s disease routinely share their findings about “biological markers” that indicate the progression of the disease. This has led to recent scientific papers suggesting the value of PET scans and tests of spinal fluids as ways to make early diagnoses of Alzheimer’s.
The conventional business response to such radical ideas of “sharing” is that no company would have adequate incentive to invest in risky research unless they could be assured exclusive ownership of the results, in order to create a revenue-generating “product” (i.e., medical treatment or drug). But the drug industry has had to concede that diseases such as Alzheimer’s are just too scientifically complicated for any single research entity to tackle; the most fruitful way forward is to pursue an “open source” approach that places the basic building-blocks of knowledge into the commons – while sanctioning the private patenting of more refined medical innovations that build on the fruits of the commons.
It’s so common-sensical that it seems faintly ridiculous that a story of this sort should merit lead-story treatment in the New York Times.
Slides and notes from Data Driven Journalism event
August 30th, 2010
Last week I attended the Data-driven journalism in Amsterdam (which we blogged about here) run by the European Journalism (who interviewed me here).
My slides from the event are now up here:
Below are some lovely lofi graphical notes from Anna Lena Schiller:
It was a very well organised event and there were lots of interesting presentations and discussions. While many there were sold on the value of public bodies opening up datasets for others to use, there were more reservations about news organisations sharing datasets with each other and with the public. To address this, I’d like to start a brief document called:
- Why should journalists and media organisations consider opening up their data?
The document would refer to existing success stories (such as the Guardian Datablog datasets, NYT Linked Data, …), compelling reasons, evidence, etc. and would appeal to enlightened self-interest. I’ve started some very preliminary notes at:
I hope this is something we will be able to discuss and add to at the data journalism event in Berlin later this week!
Beginnings of an Object Description Mapper
August 21st, 2010
Data.gov.uk releases CKAN Drupal Module
August 21st, 2010
We’re delighted to see that the data.gov.uk folks have released the code for their CKAN Drupal module. As many will know, the OKF’s CKAN powers data.gov.uk as well as over a dozen other data catalogues around the world.
From the blog post:
As part of the government’s ongoing work around transparency, today we are releasing some of the custom software code we’ve developed – a CKAN module for Drupal. This is available for anyone to review, use, or modify. We’re excited to see how developers and colleagues across the world put this work to good use in their own applications and projects.
The code itself is attached to this blog post as a tar.gz file and contains one main package with two sub-packages within. This code release allows content to be synched from CKAN into Drupal. CKAN is the system we use as our “back end” to store information about all the data government has released. Drupal is a system to publish web content, and serves as our “front end” through which people can use to find our datasets and comment on them.
The main CKANPackage code creates a Drupal custom content type to represent data in the same way as CKAN. The first sub-package is the CKANImporter which imports packages from CKAN into Drupal and allows this to take place as a one-off batch import or as an update to the latest changes since a specified time. The second sub-package is CKANDatagovuk which correlates fields in CKAN with Drupal hooks.
The code release includes comments in the files to assist users with the functionality. You can of course contact us should you have any questions.
Data Journalism Meetup, Berlin, 1st September 2010
August 20th, 2010
We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows:
- When? 1st September 2010
- Where? Fjord Office, Friedrichstrasse 210, Berlin
- Register? You can register here!
Speakers will include:
- Martin Belam, The Guardian
- Jonathan Gray, The Open Knowledge Foundation
- Christian Heise, ZEIT Online
- Gerd Kamp, Deutsche Presse Agentur
- Georgi Kobilarov, Uberblic Labs
- John O’Donovan, BBC News
- Tom Scott, BBC Earth
- Ole Wintermann, Bertelsmann Foundation
From the blurb:
Data Journalism and the new and exciting possibilities that the Web of Data opens up for creators and consumers of news and media online will be the topic of this first meetup.
We have a brilliant lineup of speakers from media organisations like the BBC, The Guardian, the Deutsche Presse Agentur, the Bertelsmann Foundation coming to Berlin and talking about data journalism and the latest developments and projects in this field, and our friends from ZEIT Online will join the discussion.
The event takes place at the office of our friends at Fjord in the heart of Berlin. Starting at 2pm, you’ll hear talks followed by a panel discussion and an open space for working groups, and when the official programme ends at 7pm we’ll of course have drinks with all of you.
Language of all talks at the event will be English, but don’t be surprised to hear a bit of German here and there in conversations.
Vote Raw Data Now at SXSW panelpicker - ends 27 August
August 19th, 2010
Announcement below — voting ends 27 August
Raw Data Now: Building an Open Data Ecosystem Rufus Pollock and Jordan Hatcher of the Open Knowledge Foundation have submitted a proposal for a workshop highlighting the great work of the Open Knowledge Foundation, including Where Does My Money Go?, Open Shakespeare, CKAN, the Open Definition, and Open Data Commons (among many many more great projects!). The panel will cover:Voting is a key part of the SXSW selection process, so please vote for our panel.
- What legal rights apply to databases?
- What tools are available to developers and data publishers involved in public sector data?
- How do I encourage public sector institutions to release data?
- If I’m in the public sector, what’s the best way for me to release my data?
- Why is open data different from open source or open content?
===
Also plug for The Itinerant Poetry Librarian’s panel will very likely also be of interest to OKFN folks into open bibliographic data and all things librarian:: “They stopped coming?”: Librarians Don’t Cry They Re-View
Workshop on Open Bibliographic Data and the Public Domain
August 17th, 2010
We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows:
- Where? Rooms 108/108a, FU Berlin, Garystr. 21, 14195 Berlin
- When? 7th October 2010
- Registration? http://publicdomain.eventbrite.com/
- Hashtag? #pdobd
- Notes? http://okfnpad.org/pdobd
Here’s the blurb:
This one day workshop will focus on open bibliographic data and the public domain. In particular it will address questions like:
- What is the role of freely reusable metadata about works in calculating which works are in the public domains in different jurisdictions?
- How can we use existing sources of open data to automate the calculation of which works are in the public domain?
- What data sharing policies in libraries and cultural heritage institutions would support automated calculation of copyright status?
- How can we connect databases of information about public domain works with digital copies of public domain works from different sources (Wikipedia, Europeana, Project Gutenberg, …)?
- How can we map existing sources of public domain works in different countries/languages more effectively?
The day will be very much focused on productive discussion and ‘getting things done’ — rather than presentations. Sessions will include policy discussions about public domain calculation under the auspices of Communia (a European thematic network on the digital public domain), as well as hands on coding sessions run by the Open Knowledge Foundation. The workshop is a satellite event to the 3rd Free Culture Research Conference on 8-9th October.
If you would like to participate, you can register at:
If you have ideas for things you’d like to discuss, please add them at:
To take part in discussion on these topics before and after this event, please join:
Gathering, Preserving and Reusing our Cultural Heritage - the OKFN Cultural Heritage Working Group.
August 16th, 2010
The following post is from Ian Ibbotson, Coordinator of the new OKF Working Group on Cultural Heritage and developer at Knowledge Integration.
Knowledge of our cultural heritage is captured today by a massive variety of organisations and people. These range from traditional museums to hyperlocal websites and private collections of scanned photos and shared on flickr. This data represents some of the most unique and engaging content available to us. Yet like so much of our digital wealth, there are often barriers to describing, sharing and finding this rich content. Over the years the cultural sector has delivered some amazing data sharing projects, yet there are still issues to be addressed by the community, particularly in the contexts of open and linked data.
In order to try and better engage with the richness of the cultural sector , the OKFN has set up a Cultural Heritage working group and invites participation from anyone with interest in the domain. Our goal is not to compete with existing technical or policy forums, but to promote sharing of information and best practice about the opening up of our cultural heritage. We’re aiming to do this by providing a forum in which anyone interested can exchange ideas or make announcements.
Over the coming months the working group will be finding it’s feet, but our initial focus is to try and map existing open cultural content (Datasets, events, people,…) and to get that information catalogued in CKAN. Participants should feel free to use the WG to disseminate information about available open collections or upcoming events, to ask for advice or just to chat. Our hope is that the working group will be both a motivator and enabler of open cultural content, giving practitioners solid arguments in favor of open heritage and help finding the tools to do it.
The open heritage working group is blogging at heritage.okfn.org, tweeting at http://twitter.com/openheritage and you can join the mailing list at http://lists.okfn.org/mailman/listinfo/open-heritage. We welcome input from anyone with ideas about the opening up of our cultural heritage.
B-Open: Open Data from Bristol City Council
August 16th, 2010
The following guest post is from Stephen Hilton, Programme Lead of the Connecting Bristol initiative.
Unusually perhaps, for a city council, we recognise and relish the fact that our city is a quirky, unorthodox, hot-bed of creative digital activity and activism. Bristol City Council has been promoting local e-democracy for the last decade. And it is this passion for digital transparency and engagement that has led us head first into the world of opengov and open data.
B-Open is a high-profile Bristol City Council open data initiative. Launched at the end of June 2010 by Barbara Janke, Council Leader and Dr Mark Wright, Executive Member for Transformation and Efficiency, the project seeks to place as much council data as possible into the public domain and catalyse the highly active digital community in the city to re-use and re-purpose it, generating new products, services and insight.
In partnership with the Bristol-based iShed, B-Open was launched as a strand of nationally recognised collaborative innovation scheme, Media Sandbox. Through Media Sandbox 3 x £10k commissions are being awarded, to develop ideas using council data which encapsulate criteria defined as “creative, smart, green, connected” and delivering public value.
An open data portal is being created to act as a focus for the community who are engaged in this work and the people who might benefit from the outcomes. Once developed, the B-Open portal will provide access to council data and importantly, will also provide a showcase for the uses of the data that will emerge.
At the time of writing a wide range of high quality applications have been received, including data visualisations, mobile games and services relating to transport and mobility - six projects have been shortlisted for interview and the 3 commissions will be announced at the end of the month.
For more information see:
Open Government Data Camp 2010, 18-19th November 2010
August 13th, 2010
The Open Knowledge Foundation is organising an international workshop on open government data, which will take place in London this autumn:
You can register at:
From the announcement:

What is it?
Basic details are as follows:
- What? A two day workshop for people interested in open government data.
- When? 18-19th November 2010
- Where? University of London Union, London, UK
- How much? Tickets cost £10 to help cover costs. You can sign up here!
- Hashtag? ##ogdcamp2010
Tell me more…
Its been a big year for open government data. Around the world governments and public bodies have been opening up official datasets for the public to reuse. There has been an explosion of new applications, competitions, hackdays and other initiatives from local authorities, central government departments, international bodies and others. This event will bring together movers and shakers from the world of open government data — including government representatives, policymakers, lawyers, technologists, academics, advocates, citizens, journalists and reusers.
What will happen?
There will be two days of discussions, drafting, planning and hacking. Crucially we hope to:
- Build consensus around key legal, technical and policy issues related to opening up government information.
- Strengthen the community of people working on different aspects of opening up official data around the world — from both inside and outside government. (Many people working on this area will not have met in person!)
- Encourage the exchange of experiences, expertise and ideas between those involved in leading open government data initiatives in different countries.
- Make things! We hope there will be plenty of space for developers to hack on things — from refining core bits and pieces of technology to rapid prototyping of new ideas.
What will the format be?
Presentations will be kept to a minimum. Each day will begin with a sprinkling of short talks followed by plenty of time to talk, plan and work on things.
Can I submit a presentation?
We are going to put out a call for short presentations (around 30 x 10 minute slots) shortly. Details/links will be posted on the open-government discussion list.
Can I propose a session?
Yes please! Again, we’re going to brainstorm, plan and schedule sessions on the open-government discussion list — so head there if you have any cunning ideas!
What kinds of topics will be covered?
Possible sessions include:
- How can we encourage other countries to open up official information?
- Open government data in law and policy: obstacles and opportunities
- Promoting reuse: competitions, community engagement, the role of the media
- Finding open government data: catalogues, registries and metadata
- Raw Data Now! Technical aspects of opening up government data
- The role and value of linked data
- Open government data and data journalism
What kinds of outputs will there be?
Projected outputs include things like:
- First draft of an international ‘open data manual’ (organised as a ‘Book Sprint’)
- A set of key open government data principles
- A timeline of key developments for open government data around the world
- A fairly comprehensive list of official initiatives — including data catalogues and competitions
- A list of key examples of the reuse of open government data
- Launch of RawDataNow.com — illustrating what we mean by ‘raw data’ aimed at those who publish official information
- Brainstorming about projects which would make it easier for citizens to find, analyse and visually represent the data they are looking for
Who’s behind the event?
Open Government Data Camp was conceived and is being primarily organised by the Open Knowledge Foundation. The event is also supported by:
- Cabinet Office, UK
- EU LAPSI project, Turin, Italy
- EU LOD2 project, Leipzig, Germany
- Guardian, UK
- Sunlight Foundation, USA
Who is coming?
You can find a list of participants at:
If you add your name to the list, please don’t forget to register! (And vice versa: if you’ve registered, please also add your name to the pad page above…)
Can I sponsor the event?
Yes please! We are still actively seeking sponsorship for lunches, coffee, travel and accommodation for international participants and so on. If you think you might be interested, please contact jonathan dot gray at okfn dot org.
What countries will be represented?
We are currently expecting representation from:
- Argentina
- Australia
- Austria
- Belgium
- Brazil
- Canada
- Denmark
- Finland
- France
- Germany
- Hungary
- Iceland
- India
- Ireland
- Italy
- Luxembourg
- Netherlands
- New Zealand
- Norway
- Russia
- Spain
- Sweden
- Taiwan
- United Kingdom
- United States
Why do I have to pay?
The £10 ticket price is to help cover costs. If the ticket price is a problem, don’t hesitate to let us know. We won’t turn anyone away because they can’t afford to come!
