The possibilities of open data have been enthralling us for 10 years.
I came to it through wanting to make Government really usable, to build sites
But that excitement isn’t what matters in the end.
What matters is scale – which organisational structures will make this movement
Whether by creating self-growing volunteer communities, or by generating flows
This post quickly and provocatively goes through some that haven’t worked
(yet!) and some that have.
Ones that are working now
1) Form a community to enter in new data. Open Street Map and MusicBrainz are two big examples. It works
as the community is the originator of the data. That said, neither has
dominated its industry as much as I thought they would have by now.
2) Sell tools to an upstream generator of open data. This is what
CKAN does for central Governments (and the new ScraperWiki CKAN tool helps with). It’s what mySociety does, when selling
FixMyStreet installs to local councils, thereby publishing their potholes as RSS feeds.
3) Use open data (quietly). Every organisation does this and never talks
about it. It’s key to quite old data resellers like Bloomberg. It is what most of
ScraperWiki’s professional services
customers ask us to do. The value to society is enormous and invisible. The
big flaw is that it doesn’t help scale supply of open data.
4) Sell tools to downstream users. This isn’t necessarily open data
specific – existing software like spreadsheets and Business Intelligence can be
used with open or closed data. Lots of open data is on the web, so tools like
the new ScraperWiki which work well with
web data are particularly suited to it.
Ones that haven’t worked
5) Collaborative curation ScraperWiki started as an audacious attempt to create an open data curation
community, based on editing scraping code in a wiki. In its original form
(now called ScraperWiki Classic) this didn’t scale.
Here are some reasons, in terms of open data models, why it didn’t.
a. It wasn’t upstream. Whatever provenance you give, people trust data most
that they get it straight from its source. This can also be a partial upstream –
for example supplementing scraped data with new data manually gathered by
b. It isn’t in private. Although in theory there’s lots to gain by wrangling
commodity data together in public, it goes against the instincts of most
c. There’s not enough existing culture. The free software movement built a rich
culture of collaboration, ready to be exploited some 15 years in by the open
source movement, and 25 years later by tools like Github. With a few
exceptions, notably OpenCorporates, there
aren’t yet open data curation projects.
6) General purpose data marketplaces, particularly ones that are mainly
reusing open data, haven’t taken off. They might do one day, however I think
they need well-adopted higher level standards for data formatting and syncing
first (perhaps something like dat,
perhaps something based
on CSV files).
Ones I expect more of in the future
These are quite exciting models which I expect to see a lot more of.
7) Give labour/money to upstream to help them create better data. This is
quite new. The only, and most excellent, example of it is the UK’s National
the Statute Law Database. They do the work with the help of staff seconded
from commercial legal publishers and other parts of Government.
It’s clever because it generates money for upstream, which people trust the most,
and which has the most ability to improve data quality.
8) Viral open data licensing. MySQL made lots of money this way, offering
proprietary dual licenses of GPLd software to embedded systems makers. In data
this could use OKFN’s Open Database License,
and organisations would pay when they wanted to mix the open data with their
own closed data. I don’t know anyone actively using it, although Chris Taggart
from OpenCorporates mentioned this model to me years ago.
9) Corporations release data for strategic advantage. Companies are starting to release
their own data for strategic gain. This is very new. Expect more of it.
What have I missed? What models do you see that will scale Open Data, and bring
its benefits to billions?