What “open data” means – and what it doesn’t

The following post is from Melanie Chernoff, Public Policy Manager for Red Hat. It was originally published on opensource.com.

Last week, an article in the Wall Street Journal talked about the Open Data Partnership, which “will allow consumers to edit the interests, demographics and other profile information collected about them. It also will allow people to choose to not be tracked at all.” The article goes on to discuss data mining and privacy issues, which are hot topics in today’s digital world, where we all wonder just how much of our personal data is out there and how it’s being used. These are valid concerns being talked about in other, more appropriate fora. I, however, would like to address my personal pet peeve about the dilution of the term open data.

The Open Knowledge Definition says it this way, “A piece of content or data is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.” Generally, this means that the data should be released in a format that is free of royalties and other IP restrictions. The problem is that an increasing number of people are using the term open data to mean publicly available data.

In the article, the CEO of the startup directing the Open Data Initiative says the goal is to “be more transparent and give consumers more control” of the data that is collected and shared. Providing a mechanism in which consumers can decide what information can be made available to advertisers is a laudable goal. However, this “open data” initiative focuses on what data is made available, when open data is really about how data is made available. This definitional shift is a problem, particularly for governments that are implementing data policies.

Simply put, all open data is publicly available. But not all publicly available data is open.

Open data does not mean that a government or other entity releases all of its data to the public. It would be unconscionable for the government to give out all of your private, personal data to anyone who asks for it. Rather, open data means that whatever data is released is done so in a specific way to allow the public to access it without having to pay fees or be unfairly restricted in its use.

In a previous article, I wrote about how the Massachusetts Bay Transit Authority (MBTA) opened up their transit data to software developers. Within 2 months, six new trip planning applications for bus and train riders had been built at no cost to the MBTA. That’s the power of open data. It was data produced by the government which was released to the public in an open format (GTFS) for free, under a license that allowed for use and redistribution.

Why does this matter? If open data is misunderstood as releasing any and all data to the public, people will become opposed to the concept due to their concerns about privacy. What we, as policy advocates, want to encourage is that the data that governments do and should publish is done so in a way to ensure equal public access by all citizens. In other words, you shouldn’t have to buy a particular vendor’s product in order to be able to open, use, or repurpose the data. You, as a taxpayer, have already paid for the collection of the data. You shouldn’t have to pay an additional fee to open it.

We’ve all seen, from the recent news about Wikileaks, that there are real privacy and/or security concerns with putting all the government’s data out there, but that is a separate issue and shouldn’t be confused with open data. Whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.

Am I being nitpicky about the term? Maybe. But we’ve seen from other tech policy battles that good definitions are crucial to framing the debate.

8 Comments

Ian Drysdale says:

December 14, 2010 at 10:05

It’s interesting to compare your definitions with those that Richard Stallman makes when comparing Open Source Software with Free Software here: http://www.gnu.org/philosophy/free-software-for-freedom.html

Pingback: What Open Data Means |
Tim Manning says:

December 12, 2010 at 17:40

As ever with language, it’s all about understanding the context that a given word is being used in.

The referred to article is not about data being made publically available, it’s about (personal) data being made available to the person the data is about and placed under their direct control.

So, in this context they mean “open” to that individual. Although in fact they mean rather more than that!

I wouldn’t attempt to claim exclusivity.

Pingback: Paco Prieto » ¿Cómo se puede abrir un Ayuntamiento?
Pingback: Tweets that mention Open Knowledge Foundation Blog » Blog Archive » What “open data” means – and what it doesn’t -- Topsy.com
will spooner says:

December 11, 2010 at 20:47

Your distinction is absolutely correct. Take the biomedical informatics field in which I work – the reference human genome is hugely valuable data generated at huge public expense, and is rightly open (this was a famous victory). Few would argue, however, that an individual’s genome, even if sequenced as part of a publicly funded project, should be open.

Lex Slaghuis says:

December 11, 2010 at 16:25

Great post! I have gotten quite annoyed with all the confusion around open data also.

I think the gov. policy focus should be on more open data not anything else!

Pingback: Tweets that mention Open Knowledge Foundation Blog » Blog Archive » What “open data” means – and what it doesn’t -- Topsy.com