The Open Knowledge Foundation believes in open knowledge: not just that some data is open and freely usable, but that it is useful – accessible, understandable, meaningful, and able to help someone solve a real problem.
A lot of the data which could help me improve my life is data about me – “MyData” if you like. Many of the most interesting questions and problems we have involve personal data of some kind. This data might be gathered directly by me (using my own equipment or commercial services), or it could be harvested by corporations from what I do online, or assembled by public sector services I use, or voluntarily contributed to scientific and other research studies.
Image: “Tape library, CERN, Geneva 2″ by Cory Doctorow, CC-BY-SA.
This data isn’t just interesting in the context of our daily lives: it bears on many global challenges in the 21st century, such as supporting an aging population, food consumption and energy use.
Today, we rarely have access to these types of data, let alone the ability to reuse and share it, even when it’s my data, about just me. Who owns data about me, who controls it, who has access to it? Can I see data about me, can I get a copy of it in a form I could reuse or share, can I get value out of it? Would I even be allowed to publish openly some of the data about me, if I wanted to?
But how does this relate to open data? After all, a key tenet of our work at the Open Knowledge Foundation is that personal data should not be made open (for obvious privacy reasons)!
However there are, in fact, obvious points where “Open Data” and “My Data” connect:
- MyData becomes Open Data (via transformation): Important datasets that are (or could be) open come from “my data” via aggregation, anonymisation and so on. Much statistical information ultimately comes from surveys of individuals, but the end results are heavily aggregated (for example, census data). This means “my data” is an important source but also that it is essential that the open data community have a good appreciation of the pitfalls and dangers here – e.g. when anonymisation or aggregation may fail to provide appropriate privacy.
MyData becomes Open Data (by individual choice): There may be people who want to share their individual, personal, data openly to benefit others. A cancer patient could be happy to share their medical information if that could assist with research into treatments and help others like them. Alternatively, perhaps I’m happy to open my household energy data and share it with my local community to enable us collectively to make sustainable energy choices. (Today, I can probably only see this data on the energy company’s website, remote, unhelpful, out of my control. I may not even be able to find out what I’m permitted to do with my data!)
The Right to Choose: if it’s my data, just about me, I should be able to choose to access it, reuse it, share it and open it if I wish. There is an obvious translation here of key Open Data principles to MyData. Where the Open Definition states that material should be freely available for use, reuse and redistribution by anyone, we could think that my data should freely available for use, reuse and redistribution by me.
We think it is important to explore and develop these connections and issues. The Open Knowledge Foundation is therefore today launching an Open Data & MyData Working Group. Sign up here to participate:
This will be a place to discuss and explore how open data and personal data intersect. How can principles around openness inform approaches to personal data? What issues of privacy and anonymisation do we need to consider for datasets which may become openly published? Do we need “MyData Principles” that include the right of the individual to use, reuse and redistribute data about themselves if they so wish?
There are plenty of challenging issues and questions around this topic. Here are a few:
Are big datasets actually anonymous? Anonymisation is incredibly hard. This isn’t a new problem (Ars Technica had a great overview in 2009) although it gets more challenging as more data is available, openly or otherwise, as more data which can be cross-correlated means anonymisation is more easily breached.
There’s a lot of value in personal data – Boston Consulting Group claim €1tn. But even BCG point out that this value can only be realised if the processes around personal data are more transparent. Perhaps we can aspire to more than transparency, and have some degree of personal control, too.
Governments are starting to offer some proposals here such as “MiData” in the UK. This is a good start but do they really serve the citizen?
There’s also some proposed legislation to drive companies to give consumers the right to see their data.
But is access enough?
The consumer doesn’t own their data (even when they have “MiData”-style access to it), so can they publish it under an open licence if they wish?
Whose data is it anyway?
Computers, phones, energy monitors in my home, and so on, aren’t all personal to me. They are used by friends and family. It’s hard to know whose data is involved in many cases. I might want privacy from others in my household, not just from anonymous corporations.
This gets even more complicated when we consider the public sphere – surveillance cameras and internet of things sensors are gathering data in public places, about groups of independent people. Can the people whose images or information are being captured access or control or share this data, and how can they collaborate on this? How can consent be secured in these situations? Do we have to accept that some information simply cannot be private in a networked world?
(Some of these issues were raised at the Open Internet of Things Assembly in 2012, which lead to a draft declaration. The declaration doesn’t indicate the breadth of complex issues around data creation and processing which were hotly debated at the assembly.)
We will need clear principles. Perhaps, just as the Open Definition has help clarify and shape the open data space, we need analogous “MyData” Principles which set out how personal data should be handled. These could include, for example:
- That my data should be made available to me in machine-readable bulk form
- That I should have right to use that data as I wish (including using, reusing and redistribution if I so wish).
- That none of my data (where it contains personal information) should be made open without my full consent.