The debate around data in our community has been densely concentrated around the question of openness. That’s not surprising. Words like “free” and “open” have dominated the conversations in the digital commons for most of its existence, mainly because most of the digital commons has been centered on copyrightable works.
Software, text, photos, videos, music are all creative works under the law, all carrying the powerful, relatively internationalized protections of copyright, and this very power allows creators to invert that power using free / libre copyright licenses. That reality has led to a set of definitions of freedom for software, for cultural works, and for knowledge, all of which are very centered on the intellectual property regimes surrounding digital objects. We’ve also propagated the idea to hardware.
And that’s carried over into the debate around data. We ask, “is it open data?” of the world.
But I spend a lot of time around data people for whom open is an afterthought. For many people it’s Big Data, right down to the requisite O’Reilly branded events. They’re worried about whether we should leverage machine learning or domain experts, not openness. Or it’s Social Data. They’re worried about privacy policies and selling the data to as many vendors as possible. It’s Blue Button, and Green Button. They’re worried about getting data into people’s hands. It’s Quantified Self. They’re worried about getting their own data into their own hands. In Washington and other capitals large and small it’s on Government Data.
Open is almost never mentioned.
And I think that’s because we’re so focused on intellectual property, on share alike and attribution and public domain, that we lose the bigger context.
Creative works came online in a cultural and technical context that allowed us to focus on freedom, and intellectual property. We have decades of history with software, photo, and video, and hundreds with text. We had a technical infrastructure ready to create, distribute, consume, and remix creative works: mailing lists, sharing websites, wiki software.
We don’t have that with data.
Data is entering the world at a rate that is so fast it’s almost incomprehensible to human brains. It’s like trying to comprehend geologic time. The cost of generating data is so low in so many spaces, and dropping like a stone in so many others, that the real challenge is to do interesting things with it. The gulf between those who can do something with data and those who can’t is a serious new case of digital divide, and licensing is just a tiny part of that gulf. Important, to be sure, but tiny.
There’s a people gulf – 190,000 machine learning experts and 1,500,000 managers in the US alone that don’t exist, but need to, to take advantage of data. That gap is worse in the developing world, and will only accelerate in coming years.
But perhaps most important is a cultural gulf – we live in a world right now that (implicitly in most cases, but increasingly explicitly) accepts the natural state of data as transactional. We trade our data, rather than our cash, for services like Facebook, Google, apps, and more. We don’t get a copy of it. We don’t know who does. We’re on the outside of the black box, but our data’s on the inside.
So my argument is that we as an “open” movement need to understand and integrate our concerns over property rights into the broader debate. We need to talk about citizen’s rights. We need to talk about the right to understand how our web searches are returned. We need to talk about how our privacy rights may be negatively impacted by more openness.
Because unlike the web, and the internet, which grew quietly in obscure corners of the world, allowing open designs to flourish, data has already drawn attention, money, and closed business models. We’re in active competition against powerful, rich opponents to create an open ecosystem at the core of data, one that TCP/IP and HTML didn’t have to fight.
Here’s hoping we can bridge the gaps before other, closed systems can do so for us. The good news is that open systems have a lovely little history of outcompeting closed ones, given time, freedom to compete, and even a small group of committed people.
John is currently running the Consent to Research project (CtR), a massive clinical research study in which people take the data they can gather about their own health and donate it for computational analysis.
He's worked at Harvard’s Berkman Center for Internet & Society, the World Wide Web Consortium, the US House of Representatives, and most recently Creative Commons.