“yes, the government should open other people’s data”
Traditionally, the Open Knowledge Foundation has worked to open non-personal data – things like publicly-funded research papers, government spending data, and so on. Where individual data was a part of some shared dataset, such as a census, great amounts of thought and effort had gone in to ensuring that individual privacy was protected and that the aggregate data released was a shared, communal asset.
But times change. Increasing amounts of data are collected by governments and corporations, vast quantities of it about individuals (whether or not they realise that it is happening). The risks to privacy through data collection and sharing are probably greater than they have ever been. Data analytics – whether of “big “ or “small” data – has the potential to provide unprecedented insight; however some of that insight may be at the cost of personal privacy, as separate datasets are connected/correlated.
Both open data and big data are hot topics right now, and at such times it is tempting for organisations to get involved in such topics without necessarily thinking through all the issues. The intersection of big data and open data is somewhat worrying, as the temptation to combine the economic benefits of open data with the current growth potential of big data may lead to privacy concerns being disregarded. Privacy International are right to draw attention to this in their recent article on data for development, but of course other domains are affected too.
Today, we’d like to suggest some terms to help the growing discussion about open data and privacy.
Our Data is data with no personal element, and a clear sense of shared ownership. Some examples would be where the buses run in my city, what the government decides to spend my tax money on, how the national census is structured and the aggregate data resulting from it. At the Open Knowledge Foundation, our default position is that our data should be open data – it is a shared asset we can and should all benefit from.
My Data is information about me personally, where I am identified in some way, regardless of who collects it. It should not be made open or public by others without my direct permission – but it should be “open” to me (I should have access to data about me in a useable form, and the right to share it myself, however I wish if I choose to do so).
Transformed Data is information about individuals, where some effort has been made to anonymise or aggregate the data to remove individually identified elements.
We propose that there should be some clear steps which need to be followed to confirm whether transformed data can be published openly as our data. A set of privacy principles for open data, setting out considerations that need to be made, would be a good start. These might include things like consulting key stakeholders including representatives of whatever group(s) the data is about and data privacy experts around how the data is transformed. For some datasets, it may not prove possible to transform them sufficiently such that a reasonable level of privacy can be maintained for citizens; these datasets simply should not be opened up. For others, it may be that further work on transformation is needed to achieve an acceptable standard of privacy before the data is fit to be released openly. Ensuring the risks are considered and managed before data release is essential. If the transformations provide sufficient privacy for the individuals concerned, and the principles have been adhered to, the data can be released as open data.
We note that some of “our data” will have personal elements. For instance, members of parliament have made a positive choice to enter the public sphere, and some information about them is therefore necessarily available to citizens. Data of this type should still be considered against the principles of open data privacy we propose before publication, although the standards compared against may be different given the public interest.
This is part of a series of posts exploring the areas of open data and privacy, which we feel is a very important issue. If you are interested in these matters, or would like to help develop privacy principles for open data, join the working group mailing list. We’d welcome suggestions and thoughts on the mailing list or in the comments below, or talk to us and the Open Rights Group, who we are working with, at the Open Knowledge Conference and other events this autumn.
Laura is CEO of the Open Knowledge Foundation, and Co-Founder and Director of Makespace. She has worked extensively in technology, innovation and leadership roles including at AT&T Labs, AlertMe.com and True Knowledge. Laura holds Masters and PhD degrees from the University of Cambridge, received the Royal Academy of Engineering Leadership Award and a NESTA Crucible Fellowship, and is a Chartered Engineer.