Support Us

You are browsing the archive for Ideas and musings.

Open Data and Privacy Concerns in Biomedical Research

Sabina Leonelli - November 26, 2012 in Ideas and musings, Open Data, Open Science, WG Open Data in Science

Privacy has long been the focus of debates about how to use and disseminate data taken from human subjects during clinical research. The increasing push to share data freely and openly within biomedicine poses a challenge to the idea of private individual information, whose dissemination patients and researchers can control and monitor.

In order to address this challenge, however, it is not enough to think about (or simply re-think) the meaning of ‘informed consent’ procedures. Rather, addressing privacy concerns in biomedical research today, and the ways in which the Open Data movement might transform how we think about the privacy of patients, involves understanding the ways in which data are disseminated and used to generate new results. In other words, one needs to study how biomedical researchers confront the challenges of making data intelligible and useful for future research.

Efficient data re-use comes from what the Royal Society calls ‘intelligent openness’ – the development of standards for data dissemination which make data both intelligible and assessable. Data are intelligible when they can be used as evidence for one or more claims, thus helping scientists to advance existing knowledge. Data are assessable when scientists can evaluate their quality and reliability as evidence, usually on the basis of their format, visualisation and extra information (metadata) also available in databases.

Yet the resources and regulatory apparatus for securing proper curation of data, and so their adequate dissemination and re-use, are far from being in place. Making data intelligible and assessable requires labour, infrastructures and funding, as well as substantial changes to the institutional structures surrounding scientific research. While the funding to build reliable and stable biomedical databases and Open Data Repositories is increasing, there is no appropriate business model to support the long-term sustainability of these structures, with national funders, industry, universities and publishing houses struggling to agree on their respective responsibilities in supporting data sharing.

Several other factors are important. For instance, the free dissemination of data is not yet welcomed by the majority of researchers, who do not have the time or resources for sharing their data, are not rewarded for doing so and who often fear that premature data-sharing will damage their competitive advantage over other research groups. There are intellectual property concerns too, especially when funding for research comes from industry or specific parts of government such as defence. Further, there are few clear standards for what counts as evidence in different research contexts and across different geographical locations. And more work needs to be done on how to relate datasets collected at different times and with different technologies.

The social sciences and humanities have an important role to help scientific institutions and funders develop policies and infrastructures for the evaluation of data-sharing practices, particularly the collaborative activities that fuel data-intensive research methods. An improved understanding of how data can be made available so as to maximise their usefulness for future research can also help tackle privacy concerns relating to sensitive data about individuals.

When it comes to sharing medical records, it is now generally agreed that providing ‘informed consent’ to individual patients is simply not possible, as neither patients not researchers themselves can predict how the data could be used in the future. Even the promise of anonymity is failing, as new statistical and computational methods make it possible to retrieve the identity of individuals from large, aggregated datasets, as shown by genome-wide association studies.

A more effective approach is the development of ‘safe havens’: data repositories which would give access to data only to researchers with appropriate credentials. This could potentially safeguard data from misuse, without hampering researchers’ ability to extract new knowledge from them. Whether this solution succeeds ultimately depends on the ability of researchers to work with data providers, including patients, to establish how data travel online, how they are best re-used and how data sharing is likely to affect, and hopefully improve, future medicine. This work is very important, and should be supported and rewarded by universities, research councils and other science funders as an integral part of the research process.

To learn more, read the report ‘Making Data Accessible to All’

Towards a public digital infrastructure: why do governments have a responsibility to go open?

Guillermo Moncecchi - November 1, 2012 in Featured, Ideas and musings, Open Government Data, WG Open Government Data

The most common argument in favor of open data is that it enhances transparency, and while the link may not always be causal, it is certainly true that both tend to go hand-in-hand. But there is another, more expansive perspective on open government data: that it is part of an effort to build public infrastructure.

Does making a shapefile available with all Montevideo’s traffic lights make Montevideo’s government more transparent? We don’t think so. But one of our duties as civil servants is building the city infrastructure. And we should understand that data is mainly infrastructure. People do things on it, as they do things on roads, bridges or parks. For money, for amusement, for philanthropy, there are myriads of uses for infrastructure: we should not try to determine or even guess which those uses are. We must just provide the infrastructure and ensure it will be available. Open data should be seen as a component of an effort to build a public digital infrastructure, where people could, within the law, do whatever they want. Exactly as they do with roads.

When you see open data in this light, several decisions become easier. Should we ask people for identification to give them our data? Answer: do you ask them for an identification to use the street? No, you don’t – then no, you shouldn’t. Why should we use open, non proprietary standards for publishing data? For the same reason you do not build a street where only certain car brands can pass. What happens if there are problems with my data, which causes problems for the users? Well, you will be liable, if the law decides that … but, would you avoid demands for accidents caused by pavement problems by not building streets? Of course you are responsible for your data: you are paid to create it, as you are paid for building bridges. Every question about open data we can imagine has already been answered for traditional infrastructure.

But of course the infrastructure required to enable people to create an information society goes beyond data. We will give you four examples.

The most direct infrastructure component is hardware and communications. The Uruguayan government recognises this, and is planning to have each home connected with fibre by then end of 2015, with 1 Gb traffic for free for everybody with a phone line. Meanwhile since 2007, every public school child gets an OLPC laptop and internet connection. This programme should be understood as being primarily about infrastructure: education encompasses much more than laptops, but infrastructure enables the development of new education paths.

Secondly, services. Sometimes it’s better to provide services than to provide data. Besides publishing cartography data, in Montevideo we provide WMS and WFS services to retrieve a map just using a URL. Services, as data, should be open: no registration, no access limit. Open services allow developers to use not only government data, but also government computation power, and, of course, government knowledge: the knowledge needed to, say, estimate the arrival time of a bus.

Thirdly, sometimes services are not enough, and we have to develop complete software components to enable public servants to do their work. Sometimes these software components should also be part of the public digital infrastructure. The people of Brazil are very clear on this: in 2007 they developed the Portal do Software Publico Brasileiro, where applications developed by or for the government are publicly available. Of course, this is not a new concept: its general version is called open source software. We believe that within this framework of public infrastructure, the discussion between open source and privative software makes no sense. Nobody would let a company be the owner of a street. If is public, it should be open.

Finally, there is knowledge. We, as the government, must tell the people what we are doing, and how we are doing it. Our knowledge should be open. We have the duty to publish our knowledge and to let others use it, so that we can participate actively in communities, propose changes, and act as an innovation factor in every task we face. Because we are paid for that: for building knowledge infrastructure.

We do not think government should let others do its work: on the contrary, we want a strong government, building the blocks of infrastructure to achieve its tasks, and making this infrastructure available to people to do whatever they want, within the law.

Exactly the same thing they do with streets.

Is Open Access Open?

Peter Murray-Rust - October 26, 2012 in Featured, Ideas and musings, Open Access

This post is cross-posted from Peter’s blog

I’m going to ask questions. They are questions I don’t know the answers to – maybe I am ignorant in which case please comment with information, or maybe the “Open Access Community” doesn’t know the answers. Warning: I shall probably be criticized by some of the mainstream “OA Community”. Please try to read beyond any rhetoric.

As background, I am well versed in Openness. I have taking a leading role in creating and launching many Open efforts – SAX, Chemical MIME, Chemical Markup Language, The Blue Obelisk, Panton Principles, Open Bibliography, Open Content Mining and helped to write a significant number of large software frameworks (OSCAR, JUMBO, OPSIN, AMI2). I’m on the advisory board of the Open Knowledge Foundation and I have contributed to or worked with Wikipedia, Open Streetmap, Stackoverflow, Open Science Summit, Mat Todd (Open Source Drug Discovery) and been to many hackathons. So I am very familiar with the modern ideology and practice of “Open”. Is “Open Access” the same sort of beast?

The features of “Open” that I value are:

  • Meritocracy. That doesn’t mean that decisions are made by hand counting, but it means that people’s views are listened to, and they enter the process when it seems right to the community. That’s happened with SAX, very much with the Blue Obelisk, and the Open Knowledge Foundation.
  • Universality of participation, particularly from citizens without formal membership or qualifications. A feeling of community.
  • A willingness to listen to other views and find means of changing strategy where necessary
  • Openness of process. It is clear what is happening, even if you are not in command.
  • Openness of results. This is universally fundamental. Although there have been major differences of opinion in Free/Open Source Software (F/OSS) everyone is agreed that the final result is free to use, modify, redistribute without permission and for any purpose. Free software is a matter of liberty, not price.
  • A mechanism to change current practice. The key thing about Wikipedia is that it dramatically enhances the way we use knowledge. Many activities in the OKF (and other Open Organisations) are helping to change practice in government, development agencies, companies. It’s not about price restrictions, it’s about giving back control to the citizens of the world. Open Streetmap produces BETTER and more innovative maps that people can use to change the lives of people living right now – e.g. the Haitian earthquake.

How does Open Access measure up against these? Not very well. That doesn’t mean it isn’t valuable, but it means that it doesn’t have obvious values I can align with. I have followed OA for most of the last 10 years and tried to contribute, but without success. I have practiced it by publishing all my own single-author papers over the last 5 years in Gold CC-BY journals. But I have never had much feeling of involvement – certainly not the involvement that I get from SAX or BlueObelisk.

That’s a harsh statement and I will elaborate:

Open Access is not universal – it looks inward to Universities (and Research Institutions). In OA week the categories for membership are:


There is no space for “citizen” in OA. Indeed, some in the OA movement emphasize this. Stevan Harnad has said that the purpose of OA is for “researchers to publish to researchers” and that ordinary people won’t understand scholarly papers. I take a strong and public stance against this – the success of Galaxy Zoo has shown how citizens can become as expert as many practitioners. In my new area of phylogenetic trees I would feel confident that anyone with a University education (and many without) would have little difficulty understanding much of the literature and many could become involved in the calculations. For me, Open Access has little point unless it reaches out to the citizenry and I see very little evidence of this (please correct me).

There is, in fact, very little role for the individual. Most of the infrastructure has been built by university libraries without involving anyone outside (regrettably, since university repositories are poor compared to other tools in the Open movements). There is little sense of community. The main events are organised round library practice and funders – which doesn’t map onto other Opens. Researchers have little involvement in the process – the mainstream vision is that their university will mandate them to do certain things and they will comply or be sacked. This might be effective (although no signs yet), but it is not an “Open” attitude.

Decisions are made in the following ways:

* An oligarchy, represented in the BOAI processes and Enabling Open Scholarship (EOS). EOS is a closed society that releases briefing papers and has a members ship of 50 EUR per year and have to be formally approved by the committee (I have represented to several members of EOS that I don’t find this inclusive and I can’t see any value in my joining – it’s primarily for university administrators and librarians).
* Library organizations (e.g. SPARC)
* Organizations of OA publishers (e.g. OASPA)

Now there are many successful and valuable organizations that operate on these principles, but they don’t use the word “Open”.

So is discussion “Open”? Unfortunately not very. There is no mailing list with both large volume of contributions and effective freedom to present a range of views. Probably the highest volume list for citizens (as opposed to librarians) is GOAL and here differences of opinion are unwelcome. Again that’s a hard statement, but the reality is that if you post anything that does not support Green Open Access then Stevan Harnad and the Harnadites will publicly shout you down. I have been denigrated on more than one occasion by members of the OA oligarchy (Look at the archive if you need proof). It’s probably fair to say that this attitude has effective killed Open discussion in OA. Jan Velterop and I are probably the only people prepared to challenge opinions: most others walk away.

Because of this lack of discussion it isn’t clear to me what the goals and philosophy of OA are. I suspect that different practitioners have many different views, including:

  • A means to reach out to citizenry beyond academia, especially for publicly funded research. This should be the top reason IMO but there is little effective practice.
  • A means to reduce journal prices. This is (one of) Harnad’s arguments. We concentrate on making everything Green and when we have achieved this the publishers will have to reduce their prices. This seems most unlikely to me – any publisher losing revenue will fight this.
  • A way of reusing scholarly output. This is ONLY possible if the output is labelled as CC-BY. There’s about 5-10 percent of this. Again this is high on my list and the only reason Ross Mounce and I can do research into phylogenetic trees.
  • A way of changing scholarship. I see no evidence at all for this in the OA community. In fact OA is holding back innovation in new methods of scholarship as it emphasizes the conventional role of the “final manuscript” and the “publisher”. Green OA relies (in practice) in having publishers and so legitimizes them

And finally is the product “Open”? The BOAI declaration is, in Cameron Neylon’s words, “clear, direct, and precise:” To remind you:

“By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

This is in the tradition of Stallman’s software freedoms, The Open Knowledge Definition and all the other examples I have quoted. Free to use, re-use and redistribute for any lawful purpose. For manuscripts it is cleanly achieved by adding a visible CC-BY licence. But unfortunately many people, including the mainstream OA community and many publishers use “(fully) Open Access” to mean just about anything. Very few of us challenge this. So the result is that much current “OA” is so badly defined that it adds little value. There have been attempts to formalize this, but they have all ended in messy (and to me unacceptable) compromise. In all other Open communities “libre” has a clear meaning – freedom as in speech. In OA it means almost nothing. Unfortunately anyone trying to get tighter approaches is shouted down. So, and this is probably the greatest tragedy, Open Access does not by default produce Open products.

For that reason we have set up our own Open-access list in the OKF.

If we can have a truly Open discussion we might make progress on some of these issues.

[1] Phylogenetic tree diagram by David Hillis, Derreck Zwickil and Robin Gutell.

The future of Open Access

Theodora Middleton - October 24, 2012 in Featured, Ideas and musings, Open Access

At the start of this week, which is Open Access week, we heard from Martin Weller about some of his fears for the future of Open Access. We’ve been collecting a few opinions from around the OKFN on the future of OA. Here’s a selection. What do you think?

Ross Mounce: The future of publicly-funded research is inevitably Open Access.

With increasing realisation that research is best distributed electronically – for speed, economic efficiency, and fairness – Open Access to publicly-funded academic research is inevitable.

It costs money to implement, maintain and enforce artificial paywalls to restrict access to research online. These create frustrating and time-consuming barriers to accessing research. Open Access is thus an obviously beneficial system that simply allows ALL to read, re-use and remix academic research, thereby truly maximising the potential return on investment from these works.

Peter Murray-Rust: Is Open Access Open?

Is Open Access really “Open”? The features of Open I value are:

  • Meritocracy: That doesn’t mean that decisions are made by hand counting, but it means that people’s views are listened to, and they enter the process when it seems right to the community.
  • Universality of participation, particularly from citizens without formal membership or qualifications. A feeling of community.
  • A willingness to listen to other views and find means of changing strategy where necessary
  • Openness of process. It is clear what is happening, even if you are not in command.
  • Openness of results. This is universally fundamental. Although there have been major differences of opinion in Free/Open Source Software (F/OSS) everyone is agreed that the final result is free to use, modify, redistribute without permission and for any purpose.
  • A mechanism to change current practice. The key thing about Wikipedia is that it dramatically enhances the way we use knowledge. Many activities in the OKF (and other Open Organisations) are helping to change practice in government, development agencies, companies. It’s not about price restrictions, it’s about giving back control to the citizens of the world.

How does OA match up? Not very well:

  • It’s not universal: it looks inwards to universities. There is no space for the ‘citizen’, or even the individual.
  • It has oligarchic and closed decision procedures: the Enabling Open Scholarship committee costs 50 euros per year to join, and requires recommendation by an existing member.
  • Discussion is closed: differing opinions aren’t listened to or wanted.
  • The product isn’t, necessarily, open either: whilst a CC-BY license would easily ensure manuscript openness, in fact the term “open access” is applied to almost anything, and means very little.

Only if we can have a truly Open discussion about these issues, will we make any progress.

A longer version of Peter’s thoughts will be published later this week.

Christian Heise: Open Access is the fundament for Open Science.

In Feburary 2002 the Budapest Open Access Initiative (BOAI) launched a worldwide campaign for open access (OA). Even if it did not invent the idea, the initiative articulated the first major international statement and public definition of open access. Now, ten years later, it has made new recommendations for the next ten years (summarized by me in five points):

1. Every institution of higher education should have access to an open access repository (through a consortium or outsourcing), and every publishing scholar in every field and country, including those not affiliated with institutions of higher education, should have deposit rights.

2. Every institution of higher education should have a policy that all future scholarly articles by faculty members and all future theses and dissertations are made open access as soon as practicable, and deposited in the institution’s designated open access repository, preferably licensed CC-BY.

3. Research institutions, including funders, should support the development and maintenance of the tools, directories, and resources essential to the progress and sustainability of open access, including: tools and APIs to convert deposits made in PDF format into machine-readable formats such as XML; the means to harvest from and re-deposit to other repositories; and tools working with alternative impact metrics.

4. The use of classic journal impact factors is discouraged. The Initiative encourage the development of alternative metrics for impact and quality which are less simplistic, more reliable, and entirely open for use and reuse.

5. The open access community should act in concert more often and we should do more to make universities, publishers, editors, referees and researchers aware of standards of professional conduct for open access publishing. We also need to articulate more clearly, with more evidence, and to more stakeholder groups the advantages and potentials of open access

These recommendations are pretty detailed on what has to be done to get a sustainable open access process in the near future. However, the far future has to be the evolution from Open Access to the holistic concept of Open Science (open access + open science data).

Tom Olijhoek: Open Interconnected Specialist Communities

In my view the future of science will ultimately depend on the formation of many interconnected scientific communities covering all possible areas. Making optimal use of the internet and social media,
scientists and citizens within and between these communities will collaborate to produce more useful knowledge than ever before and to store, maintain and provide information for those who seek it.
Especially for medical scientists in the developing world, these communities would provide vehicles for innovation, health improvement and development in their respective countries. Following this line of thought, the only hope on winning the battle against malaria, aids, neglected diseases and other tropical infections will lie in free access to and sharing of information, and in joining forces by way of social media and open science communities. MalariaWorld is our first experiment in this mode of specialist open access scientific community.

Laurent Romary: L’open access est un état d’esprit

L’open access est un état d’esprit pour le chercheur. Tous les moyens sont bons pour favoriser la dissémination des savoirs, publications, données, expertises. On peut douter que le système de publication commercial, tel que nous le connaissons actuellement réponde véritablement aux attentes des chercheurs et aux enjeux de l’interconnection des connaissances. Les infrastructures de recherche de demain, gérées par les chercheurs eux-mêmes, devront comprendre des environnements virtuels de recherche, où chaque scientifique (en sciences dures tout comme en sciences humaines) gérera ses observables, ses commentaires, ses résultats et choisira librement et sans barrière financière de les diffuser ou de les faire évaluer.

The great Open Access swindle

Martin Weller - October 22, 2012 in Featured, Ideas and musings, Open Access

This week is Open Access week, and we’ll be running a few pieces mulling over where Open Access has got to, and where it’s going. Here Martin Weller discusses some reservations…

The Cunning Thief, by Chocarne-Moreau. PD

Just to be clear from the outset, I am an advocate for open access, and long ago took a stance to only publish OA and to only review for OA. I’m not suggesting here that open access is itself a swindle, but rather that the current implementation, in particular commercial publishers adopting Gold OA, is problematic.

In my digital scholarship book, I made two pleas, the first was for open access publishing, and the second was for scholars to own the process of change. On this second point, the book ends thus:

“This is a period of transition for scholarship, as significant as any other in its history, from the founding of universities to the establishment of peer review and the scientific method. It is also a period that holds tension and even some paradoxes: it is both business as usual and yet a time of considerable change; individual scholars are being highly innovative and yet the overall picture is one of reluctance; technology is creating new opportunities while simultaneously generating new concerns and problems….

…For scholars it should not be a case of you see what goes, you see what stays, you see what comes, but rather you *determine* what goes, what stays and what comes.”

The open access element has proceeded faster than even I imagined when writing this back in 2010/2011. The Finch Report can be seen as the crowning achievement of the open access movement, in setting out a structure for all UK scholarly articles to be published as open access. But in rather typical “you academics are never happy” mode I’ve become increasingly unhappy about the route Open Access is taking. And the reason is that it fails to meet the second of my exhortations, in that it is a method being determined by the publishing industry and not by academics themselves.

The favoured route is that of Gold OA, under which authors pay publishers to have open access articles published, usually through research funds. This is good in that it means these research papers will be openly available to all, but bad from a digital scholarship perspective. And here’s why:

1) Ironically, openness may lead to elitism. If you need to pay to publish, then, particularly in cash-strapped times, it becomes something of a luxury. New researchers, or smaller universities won’t have these funds available. Many publishers have put in waivers for new researchers (PLoS are excellent at this), but there’s no guarantee of these, and after all, the commercial publishers are concerned with maximising profits. If there are enough paying customers around then it’s not in their interest to give out too many freebies. And it also means richer universities can flood journals with articles. Similarly those with research grants can publish, as this is where the funding will come from, and those without can’t. This will increase competition in an already ludicrously competitive research funding regime. You’re either in the boat or out of it will be the outcome. The Scholarly Kitchen blog has a good piece on OA increasing the so-called Matthew Effect. It would indeed be a strange irony if those of us who have been calling for open access because of a belief in wider access and a more democratic knowledge society end up creating a self-perpetuating elite.

2) It will create additional cost. Once the cost is shifted to research funders, then the author doesn’t really care about the price. There is no strong incentive to keep costs down or find alternative funding mechanisms. This is great news for publishers who must be rubbing their hands with glee. It is not only a licence to carry on as they were, but they have successfully fended off the threat of free publication and dissemination that the internet offers. Music industry moguls must be looking on with envy. The cost for publication is shifted to taxpayers (who ultimately fund research) or students (if it comes out of university money). The profits and benefits stay with the publishers. It takes some strained squinting to view this as a victory.

Steven Harnad argues again for Green OA, claiming that

“Publishers– whose primary concern is not with maximizing research usage and progress but with protecting their current revenue streams and modus operandi –are waiting for funders or institutions to pledge the money to pay Gold OA publishing fees. But research funds are scarce and institutional funds are heavily committed to journal subscriptions today. There is no extra money to pay for Gold OA fees”

3) It doesn’t promote change – in my book I also talked about how a digital, networked and open approach could change what we perceive as research, and that much of our interpretation of research was dictated by the output forms we have. So, for instance we could see smaller granularity of outputs, post review, different media formats, all beginning to change our concept of what research means. But Gold OA that reinforces the power of commercial publishers, simply maintains a status quo, and keeps the peer-reviewed 5000 word article as the primary focus of research that must be attained.

I’ve heard Stephen Downes say that as soon as any form of commercial enterprise touches education it ruins it (or words to that effect). I wouldn’t go that far, I think for instance that commercial companies often make a better job of software and technology than universities, but academic publishing is such an odd business that maybe it doesn’t make sense as a commercial enterprise. As David Wiley so nicely parodies in his trucker’s parable, there isn’t really another industry like it. Academics (paid by the taxpayer or students) provide free content, and then the same academics provide free services (editorship and peer-review) and then hand over rights and ownership to a commercial company, who provide a separate set of services, and then sell back the content to the same group of academics.

I know a few people who work in commercial publishing, and they are smart, good people who genuinely care about promoting knowledge and publishing as a practice. This is not a cry for such people to be out on the streets, but rather for their skills and experience to be employed by and for universities, the research communities and the taxpayer rather than for shareholders. In this business Downes’ contamination theory seems to hold, there is simply no space in the ecosystem for profit to exist, and when it does it corrupts the whole purpose of the enterprise, which is to share and disseminate knowledge.

Gold OA is not inherently detrimental. There are plenty of non-profit publishers who operate this model and they keep costs down to a minimum and have a generous fee waiver policy. They are, after all, not concerned with making a profit, and are concerned with knowledge dissemination. Other models exist also, including subsidised university presses, centralised publishing platforms, etc. The swindle is that there is no real incentive to explore these possibilities because the standard model has been reinforced through the manner in which OA has been implemented. As Tim O’Reilly comments “If we’re going to get science policy right, it’s really important for us to study the economic benefit of open access and not just accept the arguments of incumbents”.

[An earlier version of this post was originally posted on Martin Weller's blog]

Video: Julia Kloiber on Open Data

Rufus Pollock - October 3, 2012 in Ideas and musings, Interviews, OKF Germany, OKFest, Our Work

Here’s Julia Kloiber from OKFN-DE’s Stadt-Land-Code project, talking at the OKFest about the need for more citizen apps in Germany, the need for greater openness, and how to persuade companies to open up.

Managing Expectations II: Open Data, Technology and Government 2.0 – What Should We, And Should We Not Expect

Rufus Pollock - September 13, 2012 in Featured, Ideas and musings, Open Data, Policy

This is second of two pieces about “managing expectations” (the first is here). Open data has come a long way in the last few years and so have expectations. There’s a growing risk that open data will be seen as a panacea that will magically solve climate change or eliminate corruption. This is dangerous because it will inevitably fail to do so and hope and enthusiasm will be replaced by disappointment and dis-engagement.

This would be a tragedy as open data is valuable to us socially, commercially and culturally. However, we do need to think hard about how to make effective use of open data. Open data is usually only one part of a solution and we need to identify and work on the other key factors, such as institutions and tools, needed to bring about real change.

Some steps in a theory of change – see below for discussion. Note that the 3rd step is both by far the most important and most complex.

Government 2.0, Open Data and IT

More than two years ago a UK Government civil servant came to visit me. A new government had taken power in the UK for the first time in a decade and she wanted to ask to me about “Government 2.0″, open data and transparency.

One thing was immediately apparent from our discussion: while she was already excited by these ideas it wasn’t entirely clear to her what they involved or exactly what problems they would help with — something that has remained a common feature of conversations I’ve had since.

In my view (a view expressed in that conversation two years ago), there are (at least) two distinct — albeit related — ideas for what “Government 2.0″ means:

  • Improving (Government) services by utilizing current information technology and open data — open data being especially interesting (and novel) as it could turn Government from the direct supplier of services to the supplier of the data (and infrastructure) needed to run those services (‘Government as a Platform’)
  • More interactive, participatory governance (and therefore more “democratic) via the use, again, of open information and technology (though the connection was somewhat vaguer).

Put like this it’s clear why “Government 2.0″ can appear so exciting — after all it appears to promise a radical improvement, even transformation, of government.

But it also should make us concerned. Unrealistic expectations can be dangerous — something that is generally beneficial can get confused with a miracle cure and then blamed when it fails to deliver.

Moreover, there’s the risk that we start fixating on this wondrous new possibility (open data and technology) and ignore other key (but less exciting) elements in solving our problems — with the consequence that we much reduce the actual benefit we got from these new innovations in policy and technology.

This second point seemed especially important as it could lead to the dangerous assumption developing that open information + IT would magically turn into better (and more participatory) governance without much examination of how this exactly would come about and any changes to the form and structure of governance that would be needed.

The danger here is of confusing necessary with sufficient conditions: open data may be necessary part of better and more participatory governance but they are likely not sufficient without, say, substantial other changes in the structure and machinery of government (e.g. who gets to vote, when and where).1 These latter changes are normally costly and much more difficult than adopting new IT or opening up data. Thus, whilst new IT and open data are important, they are likely only one (possibly small) part of a solution.

When is Open Data (Part of) the Solution?

To help think about and clarify this question — of the role of open data and IT vis-a-vis other factors in a solution — I drew the first version of this diagram (a diagram I have drawn again and again over the last few years).

The purpose of the diagram is to provide a rough-and-ready way to think about the role of open data (and IT) in solving a specific problem compared to other factors such as institutions.

Some specific problems are listed for illustrative purposes. For example, Climate Change is situated up at the top-left implying that Open Data + IT likely play a relatively limited role compared to other factors such as institutions and governance change (roughly: the real problem here is reaching international agreement on a solution not more (open) information).

Conversely finding a better way to get to work is likely a problem where Open Data + IT can have a very large impact irrespective of any other factors. Meanwhile, for an issue like Corruption there would be debate as to where to situate it: on the one hand Transparency and Open Data can have a big impact with relatively little institutional or governance change, however, on the other hand one could argue that without reasonably significant governance and institutional change, open data and transparency would have little effect.

Note that diagram and examples given are for illustration purposes and don’t necessarily reflect my views (you could argue, for example, that Climate Change should be situated somewhere quite different!)

A Theory of Change

What this line of thought suggests is that we need to delve deeper into the exact “theory of change” for a given area. Around open data I think there is a general chain of logic, which runs, roughly, as follows:

  1. Open (digital) data + IT dramatically lowers the cost of access to information
  2. This includes information about what the government is doing (be that in terms of laws or filling in pot-holes)
  3. Armed with this better information citizens (or other groups) will

    1. Be able to hold government accountable and/or drive change
    2. Have a better sense of how their polity operates (improving trust etc)

In essence it presumes some theory of change like this (a diagram I also drew that day for the civil servant):

The key question is around step 3: “Action (& Change)”. It highlights the often missing (but implicit) assumption in much of this discussion that once information is available action and change will follow. But action, even in highly developed democracies can be hard for several reasons:

  1. Understanding and action requires attention: analyzing and acting on information requires time and attention and these seem to be (increasingly) scarce. Crudely put: do I go out to the cinema with my friends or do I read up on the latest draft law?

    The key cost of becoming politically active is not the direct cost of acquiring information (be that what is happening or the email address of the representative to contact) but the attention and time cost in analyzing, understanding and acting on that information. If so, open data and IT may only have a limited impact on reducing the cost of taking “political action”

  2. Digital technology by reducing simple transmission costs has substantially increased the amount of “information” competing for attention (information should also be interpreted here in the broad sense and include entertainment and anything else that could be shipped as digital “bits”)

  3. The problems of coordination and collective action: crudely, why should I bother to act if it requires a million of us to act for something to happen. Coordination problems are as old as humanity itself. While one can argue that modern communication technologies can assist us in coordination (cf the debate on the “social media revolutions”) they would seem at best to offer mild improvement on a fundamentally difficult problem (see the appendix below for more on this).


I should emphasize that I am far from arguing that open data (and IT) are not important. However, we need to temper our enthusiasm with an appreciation that they are only one part of the solution. As next steps I think we need to:

  • Think hard about what problems to tackle — if technology and open information are the tools to hand we want to focus on problems where they are especially effective. Using the first of the diagrams above can be a useful exercise in clarifying where a particular problem is located.
  • Be clear that other changes or improvements in, say, institutions, will be needed. We should work out what these are and then endeavour to make them. We should be aware that often these changes will be both more important and much harder than those that we can achieve with technology and open information alone.
  • Appreciate that open data and technology are attractive tools because they are (relatively) very cheap and straightforward to use. This is worth bearing in mind: even if open data and (information) technology are 10% of solving a problem they are an incredibly cheap 10% to do.
  • Acknowledge that open information and technology will often be complements to institutional change not substitutes. If so we cannot just do more open information and less governance reform — that would be likely giving you a second hammer to compensate for having no nails.

Appendix: Principal Agent Theory and Government

Imagine I hire a real-estate agent to sell my house for me. In legal/economic parlance I am the principal (I own the house) and the real-state agent is the “agent” — the person acting for the principal.

The normal “problem” of such relationships is that the interests and goals of the two parties are not aligned.

Take the real-estate case: suppose the two parties negotiate a 6% commission plus an up-front fee — then for every additional $1 in the sale price the agent manages to get from a buyer for the house the agent will receive only 6 cents in commission. This implies a strong divergence, at least in pure monetary terms, between the incentives of the principal (the owner of the house) and the agent (the real-estate agent).

To make this even more concrete, consider a situation where by working a weekend, and doing extra showings of the house it will be sold for an extra $10,000. Suppose that the agent crudely values this weekend time at $1000 a day (imagine their daughter has a birthday party!). For them the “payoff” is: $600 (6% of of $10k) – $2000 (their time cost) = -$1400.2 So they have a strong incentive not to bother. Meanwhile the principal would clearly make the effort: even assuming a higher cost of their time of $5k their “payoff” is $10k – $5k = $5k.

So how does this relate to government? Well, Lincoln may have said that Government is by, for and of the people but in truth only the middle of these is true: Government is a classic principal agent setup in which the principal (the people) appoint agents (their elected representatives) to govern the polity.

Thus, Government faces exactly the kind of principal agent problems above: the incentives of elected officials (or bureaucrats) may differ markedly from those of the citizens as a whole. For example, elected officials may primarily care about remaining in office (just as the real-estate agent care about commission) rather than ensuring the best outcome for citizens — this trade-off will be especially acute when narrow but powerful groups who are able to provide monetary or other support for, say, re-election, have interests that conflict with the general welfare of society.

The focus of most principal-agent analysis is on how to better align the interests of the agent with that of the principal. Normally this involves some form of monitoring (so the principal has a better sense of what the agent is doing) combined with some form of reward and sanctions based on outcomes and whatever information a principal has managed to glean about an agent’s actions.

In a perfect world, the principal would know exactly what the agent was doing and with the right set of rewards and sanctions could then ensure they did exactly what was wanted. However, in this situation the principal would essentially be the agent (how else does one know exactly what they are doing) and so the real question is how well can one with imperfect information and imperfect rewards and sanctions. We need not go into detail here but the key (and obvious) point is that the more imperfect the principal’s information and the more imperfect their rewards and sanctions and poorer the alignment will be.

Unfortunately on this basis there are several reasons to think the governance principal-agent problem is especially bad (with the “people” as principals and “government” as agents):

  • Government is complex – this makes it hard for the principal (the “people”) to know what the agent(s) are doing. Remember it’s more about knowing the agents actions than outcomes since outcomes, due to uncertainty, only partially reflect an agent’s effort. Very strong incentives based on uncertain outcomes can be counter-productive — if I could work very hard and it will all be for nothing because things go badly for random reasons then maybe I should not bother and just see what random chance brings.
  • The incentives that can be offered to agents (the governors) are relatively crude — being voted out at the next election (or being overthrown in an uprising!)
  • Governance in fact has multiple levels of principal-agent relationships: the “people” may elect representatives who in turn appoint or utilize a managerial bureaucracy to run government — in this case elected officials are principals and the bureaucracy is agent.
  • The very large number of individual citizens makes the coordination problem of acting to sanction or reward an agent especially difficult — the simplest form of this sanctioning (rewarding) is in the form of elections yet the incentives for any given citizen to make the effort to participate is very low: why should I bother to vote when I am only one among millions? (we note that turnout in most countries has been consistently dropping).

  1. Of course, it is true that technology and the open flow of information can enable certain forms of governance that are otherwise very difficult — for example, modern IT makes it possible literally to hold daily votes of all citizens, something that would otherwise be impossible except in the very smallest of polities. 

  2. nb. i have made no allowance for risk aversion here given that gain is expected $10k. However, the basic point would still stand. 

Managing Expectations

Rufus Pollock - July 24, 2012 in Ideas and musings, Open Content, Open Data, Our Work

We’re big on promoting open information: be that sonnets, statistics, genes or geodata. We’re big on it because we think it has the potential to improve the welfare of peoples around the world in a variety of ways, from making governments more accountable to improving research on cancer.

At the same time I think it is important that we, and others, are realistic about what will be achieved and on what time-scale. This can be a difficult thing to say. Often, to get people to travel with you, you have to sell them a grand vision of how whatever you are doing will revolutionize things overnight. But most changes, especially big ones, are more gradual.

Think of the celebrated invention of the printing press, today often compared to the invention of the computer. The printing press did, ultimately, produce a “revolution” and wrought plenty of change. But it didn’t happen overnight. What is more, the effect of the printing press on, say, the balance of political power or even more prosaic matters like literacy was by no means immediate. Change occurred over a period of decades or centuries and was often dependent on the evolution of a complex set of complementary institutions and technologies.


Similar patterns can be seen for another fundamental technological development: electricity. Legend has it that when Faraday first demonstrated an electric effect at the Royal Society in the 1830s, Gladstone questioned whether it was not just a scientific curiosity given its lack of obvious applications — to which Faraday famously replied: “What good is a baby?”. It took over a century for electricity to reach anything like its full potential.

Michael Faraday delivering a lecture in 1856

Today we find ourselves in a similar situation. Whilst we live in a much accelerated age compared to 15th century Germany or 19th century England, we probably still need to think on a time-scale of decades if we are going to see the full effects of the new open approaches to the creation and sharing of knowledge — approaches that we have only just begun to explore.

This is the first of two posts on this topic by Rufus Pollock, a Founder and Director of the Open Knowledge Foundation, and a Shuttleworth Foundation Fellow.

Science, data and the public

Jonathan Gray - July 21, 2012 in Featured, Ideas and musings, Open Data, Open Science

Omnitruncated 120/600 Cell by Jonathan Gray (jwyg on Flickr)

Earlier this week the European Commission released a package of documents related to their nascent policies on access to scientific information. What will these mean for science and for public engagement with science?

New open access policies have been in the headlines quite a bit recently, as politicians and policy makers respond to the wave of public support precipitated by the so-called academic spring earlier this year.

On Monday the UK government announced that all its publicly funded research will be open access within two years (though not everyone is convinced about plans for how this will be achieved). Open access has received a more modest upwards bump up the political agenda in the US, with a meeting between Obama’s science advisor and prominent access advocates, and a flurry of support for a petition requesting an open access mandate for publicly funded research.

The European Commission has been broadly supportive of open access policies for some time. It is already piloting open access for 20% of the research it funds under the €50 billion Seventh Framework Programme (FP7), citing what it calls the “fifth freedom”, “the free circulation of researchers and scientific knowledge”. So what’s new?

For a start it is notable that the EC explicitly highlights open access to scientific research data as well as to scientific research publications. It explicitly highlights parallels between opening up publicly funded research data and opening up public sector data. And – interestingly – it explicitly mentions not only scientists and research institutions but also citizens as potential users of scientific data.

This is new. Our volunteer led Open Science Working Group at the Open Knowledge Foundation has been working with key stakeholders to promote open scientific data for a number of years – from policy initiatives like the Panton Principles and the Panton Fellowships, to the recently launched open source PyBossa crowdsourcing platform, developed in association with the Citizen Cyberscience Centre. As far as we know there has not been a comparable public policy development which offers such strong or explicit support for opening up scientific data.

The European Commission’s basic message is that – with limited exceptions such as privacy and third party rights – maximising reusability is the best way to maximise scientific innovation and return on investment. And the wording is reasonably strong. One document (PDF) says “information already paid for by the public purse should not be paid for again each time it is accessed or used”. Another (PDF) says “policies on open access to scientific research results should apply to all research that receives public funds”.

Concrete measures that will be taken to address this include working with member states to implement and strengthen open access policies for publicly funded publications and data, strengthening their own commitment to open access with research that they fund (including open access to all publications), and investing in infrastructure to support the reuse of scientific data.

There are clearly reasonably strong overlaps between the EC’s thinking on publicly funded scientific data and their thinking about public sector data. They want to create policies that will increase innovation by allowing more people to derive and create value from data – rather than letting it moulder on institutional hard drives, or sit behind paywalls. The logic in both cases is similar: unlock data, facilitate reuse, maximise impact and value to society.

While the benefits of open scientific data for scientists and research institutions are reasonably well documented – the Human Genome Project is probably the best known exemplar – one wonders what innovations we might see from non-experts and non-scientists, and what more open policies might mean for the public understanding of science.

You don’t have to accept that anyone can be a scientist without prior training to see the value of citizen science projects like Galaxy Zoo and EyeWire, which harness input from users to complete simple tasks. The Clearer Climate Code project is an amazing example of scientific innovation by non-scientists as a direct result of open scientific data – resulting in NASA preferring algorithms written by a couple of dedicated volunteers to their own code.

Imagine we could leverage input from bright and committed members of the public to increase the pace of scientific development in areas of pressing concern – from climate change to cancer. The new measures proposed by the EC are a small step towards enabling this to happen. Let’s hope that EU member states and other countries will follow suit, and recognise that a world in which scientific data is open by default is better than one in which it is closed.

NB: This article initially appeared on the Guardian’s Data Blog

Why Open Data isn’t enough.

John Wilbanks - May 28, 2012 in Featured, Ideas and musings, Open Data

The debate around data in our community has been densely concentrated around the question of openness. That’s not surprising. Words like “free” and “open” have dominated the conversations in the digital commons for most of its existence, mainly because most of the digital commons has been centered on copyrightable works.

Software, text, photos, videos, music are all creative works under the law, all carrying the powerful, relatively internationalized protections of copyright, and this very power allows creators to invert that power using free / libre copyright licenses. That reality has led to a set of definitions of freedom for software, for cultural works, and for knowledge, all of which are very centered on the intellectual property regimes surrounding digital objects. We’ve also propagated the idea to hardware.

And that’s carried over into the debate around data. We ask, “is it open data?” of the world.

But I spend a lot of time around data people for whom open is an afterthought. For many people it’s Big Data, right down to the requisite O’Reilly branded events. They’re worried about whether we should leverage machine learning or domain experts, not openness. Or it’s Social Data. They’re worried about privacy policies and selling the data to as many vendors as possible. It’s Blue Button, and Green Button. They’re worried about getting data into people’s hands. It’s Quantified Self. They’re worried about getting their own data into their own hands. In Washington and other capitals large and small it’s on Government Data.

Open is almost never mentioned.

And I think that’s because we’re so focused on intellectual property, on share alike and attribution and public domain, that we lose the bigger context.

Creative works came online in a cultural and technical context that allowed us to focus on freedom, and intellectual property. We have decades of history with software, photo, and video, and hundreds with text. We had a technical infrastructure ready to create, distribute, consume, and remix creative works: mailing lists, sharing websites, wiki software.

We don’t have that with data.

Data is entering the world at a rate that is so fast it’s almost incomprehensible to human brains. It’s like trying to comprehend geologic time. The cost of generating data is so low in so many spaces, and dropping like a stone in so many others, that the real challenge is to do interesting things with it. The gulf between those who can do something with data and those who can’t is a serious new case of digital divide, and licensing is just a tiny part of that gulf. Important, to be sure, but tiny.

There’s a people gulf – 190,000 machine learning experts and 1,500,000 managers in the US alone that don’t exist, but need to, to take advantage of data. That gap is worse in the developing world, and will only accelerate in coming years.

But perhaps most important is a cultural gulf – we live in a world right now that (implicitly in most cases, but increasingly explicitly) accepts the natural state of data as transactional. We trade our data, rather than our cash, for services like Facebook, Google, apps, and more. We don’t get a copy of it. We don’t know who does. We’re on the outside of the black box, but our data’s on the inside.

So my argument is that we as an “open” movement need to understand and integrate our concerns over property rights into the broader debate. We need to talk about citizen’s rights. We need to talk about the right to understand how our web searches are returned. We need to talk about how our privacy rights may be negatively impacted by more openness.

Because unlike the web, and the internet, which grew quietly in obscure corners of the world, allowing open designs to flourish, data has already drawn attention, money, and closed business models. We’re in active competition against powerful, rich opponents to create an open ecosystem at the core of data, one that TCP/IP and HTML didn’t have to fight.

Here’s hoping we can bridge the gaps before other, closed systems can do so for us. The good news is that open systems have a lovely little history of outcompeting closed ones, given time, freedom to compete, and even a small group of committed people.

Get Updates