Support Us

You are browsing the archive for Ideas and musings.

The future of Open Access

October 24, 2012 in Featured, Ideas and musings, Open Access

At the start of this week, which is Open Access week, we heard from Martin Weller about some of his fears for the future of Open Access. We’ve been collecting a few opinions from around the OKFN on the future of OA. Here’s a selection. What do you think?

Ross Mounce: The future of publicly-funded research is inevitably Open Access.

With increasing realisation that research is best distributed electronically – for speed, economic efficiency, and fairness – Open Access to publicly-funded academic research is inevitable.

It costs money to implement, maintain and enforce artificial paywalls to restrict access to research online. These create frustrating and time-consuming barriers to accessing research. Open Access is thus an obviously beneficial system that simply allows ALL to read, re-use and remix academic research, thereby truly maximising the potential return on investment from these works.

Peter Murray-Rust: Is Open Access Open?

Is Open Access really “Open”? The features of Open I value are:

  • Meritocracy: That doesn’t mean that decisions are made by hand counting, but it means that people’s views are listened to, and they enter the process when it seems right to the community.
  • Universality of participation, particularly from citizens without formal membership or qualifications. A feeling of community.
  • A willingness to listen to other views and find means of changing strategy where necessary
  • Openness of process. It is clear what is happening, even if you are not in command.
  • Openness of results. This is universally fundamental. Although there have been major differences of opinion in Free/Open Source Software (F/OSS) everyone is agreed that the final result is free to use, modify, redistribute without permission and for any purpose.
  • A mechanism to change current practice. The key thing about Wikipedia is that it dramatically enhances the way we use knowledge. Many activities in the OKF (and other Open Organisations) are helping to change practice in government, development agencies, companies. It’s not about price restrictions, it’s about giving back control to the citizens of the world.

How does OA match up? Not very well:

  • It’s not universal: it looks inwards to universities. There is no space for the ‘citizen’, or even the individual.
  • It has oligarchic and closed decision procedures: the Enabling Open Scholarship committee costs 50 euros per year to join, and requires recommendation by an existing member.
  • Discussion is closed: differing opinions aren’t listened to or wanted.
  • The product isn’t, necessarily, open either: whilst a CC-BY license would easily ensure manuscript openness, in fact the term “open access” is applied to almost anything, and means very little.

Only if we can have a truly Open discussion about these issues, will we make any progress.

A longer version of Peter’s thoughts will be published later this week.

Christian Heise: Open Access is the fundament for Open Science.

In Feburary 2002 the Budapest Open Access Initiative (BOAI) launched a worldwide campaign for open access (OA). Even if it did not invent the idea, the initiative articulated the first major international statement and public definition of open access. Now, ten years later, it has made new recommendations for the next ten years (summarized by me in five points):

1. Every institution of higher education should have access to an open access repository (through a consortium or outsourcing), and every publishing scholar in every field and country, including those not affiliated with institutions of higher education, should have deposit rights.

2. Every institution of higher education should have a policy that all future scholarly articles by faculty members and all future theses and dissertations are made open access as soon as practicable, and deposited in the institution’s designated open access repository, preferably licensed CC-BY.

3. Research institutions, including funders, should support the development and maintenance of the tools, directories, and resources essential to the progress and sustainability of open access, including: tools and APIs to convert deposits made in PDF format into machine-readable formats such as XML; the means to harvest from and re-deposit to other repositories; and tools working with alternative impact metrics.

4. The use of classic journal impact factors is discouraged. The Initiative encourage the development of alternative metrics for impact and quality which are less simplistic, more reliable, and entirely open for use and reuse.

5. The open access community should act in concert more often and we should do more to make universities, publishers, editors, referees and researchers aware of standards of professional conduct for open access publishing. We also need to articulate more clearly, with more evidence, and to more stakeholder groups the advantages and potentials of open access

These recommendations are pretty detailed on what has to be done to get a sustainable open access process in the near future. However, the far future has to be the evolution from Open Access to the holistic concept of Open Science (open access + open science data).

Tom Olijhoek: Open Interconnected Specialist Communities

In my view the future of science will ultimately depend on the formation of many interconnected scientific communities covering all possible areas. Making optimal use of the internet and social media, scientists and citizens within and between these communities will collaborate to produce more useful knowledge than ever before and to store, maintain and provide information for those who seek it. Especially for medical scientists in the developing world, these communities would provide vehicles for innovation, health improvement and development in their respective countries. Following this line of thought, the only hope on winning the battle against malaria, aids, neglected diseases and other tropical infections will lie in free access to and sharing of information, and in joining forces by way of social media and open science communities. MalariaWorld is our first experiment in this mode of specialist open access scientific community.

Laurent Romary: L’open access est un état d’esprit

L’open access est un état d’esprit pour le chercheur. Tous les moyens sont bons pour favoriser la dissémination des savoirs, publications, données, expertises. On peut douter que le système de publication commercial, tel que nous le connaissons actuellement réponde véritablement aux attentes des chercheurs et aux enjeux de l’interconnection des connaissances. Les infrastructures de recherche de demain, gérées par les chercheurs eux-mêmes, devront comprendre des environnements virtuels de recherche, où chaque scientifique (en sciences dures tout comme en sciences humaines) gérera ses observables, ses commentaires, ses résultats et choisira librement et sans barrière financière de les diffuser ou de les faire évaluer.

The great Open Access swindle

October 22, 2012 in Featured, Ideas and musings, Open Access

This week is Open Access week, and we’ll be running a few pieces mulling over where Open Access has got to, and where it’s going. Here Martin Weller discusses some reservations…

The Cunning Thief, by Chocarne-Moreau. PD

Just to be clear from the outset, I am an advocate for open access, and long ago took a stance to only publish OA and to only review for OA. I’m not suggesting here that open access is itself a swindle, but rather that the current implementation, in particular commercial publishers adopting Gold OA, is problematic.

In my digital scholarship book, I made two pleas, the first was for open access publishing, and the second was for scholars to own the process of change. On this second point, the book ends thus:

“This is a period of transition for scholarship, as significant as any other in its history, from the founding of universities to the establishment of peer review and the scientific method. It is also a period that holds tension and even some paradoxes: it is both business as usual and yet a time of considerable change; individual scholars are being highly innovative and yet the overall picture is one of reluctance; technology is creating new opportunities while simultaneously generating new concerns and problems…. …For scholars it should not be a case of you see what goes, you see what stays, you see what comes, but rather you *determine* what goes, what stays and what comes.”

The open access element has proceeded faster than even I imagined when writing this back in 2010/2011. The Finch Report can be seen as the crowning achievement of the open access movement, in setting out a structure for all UK scholarly articles to be published as open access. But in rather typical “you academics are never happy” mode I’ve become increasingly unhappy about the route Open Access is taking. And the reason is that it fails to meet the second of my exhortations, in that it is a method being determined by the publishing industry and not by academics themselves.

The favoured route is that of Gold OA, under which authors pay publishers to have open access articles published, usually through research funds. This is good in that it means these research papers will be openly available to all, but bad from a digital scholarship perspective. And here’s why:

1) Ironically, openness may lead to elitism. If you need to pay to publish, then, particularly in cash-strapped times, it becomes something of a luxury. New researchers, or smaller universities won’t have these funds available. Many publishers have put in waivers for new researchers (PLoS are excellent at this), but there’s no guarantee of these, and after all, the commercial publishers are concerned with maximising profits. If there are enough paying customers around then it’s not in their interest to give out too many freebies. And it also means richer universities can flood journals with articles. Similarly those with research grants can publish, as this is where the funding will come from, and those without can’t. This will increase competition in an already ludicrously competitive research funding regime. You’re either in the boat or out of it will be the outcome. The Scholarly Kitchen blog has a good piece on OA increasing the so-called Matthew Effect. It would indeed be a strange irony if those of us who have been calling for open access because of a belief in wider access and a more democratic knowledge society end up creating a self-perpetuating elite.

2) It will create additional cost. Once the cost is shifted to research funders, then the author doesn’t really care about the price. There is no strong incentive to keep costs down or find alternative funding mechanisms. This is great news for publishers who must be rubbing their hands with glee. It is not only a licence to carry on as they were, but they have successfully fended off the threat of free publication and dissemination that the internet offers. Music industry moguls must be looking on with envy. The cost for publication is shifted to taxpayers (who ultimately fund research) or students (if it comes out of university money). The profits and benefits stay with the publishers. It takes some strained squinting to view this as a victory.

Steven Harnad argues again for Green OA, claiming that

“Publishers– whose primary concern is not with maximizing research usage and progress but with protecting their current revenue streams and modus operandi –are waiting for funders or institutions to pledge the money to pay Gold OA publishing fees. But research funds are scarce and institutional funds are heavily committed to journal subscriptions today. There is no extra money to pay for Gold OA fees”

3) It doesn’t promote change – in my book I also talked about how a digital, networked and open approach could change what we perceive as research, and that much of our interpretation of research was dictated by the output forms we have. So, for instance we could see smaller granularity of outputs, post review, different media formats, all beginning to change our concept of what research means. But Gold OA that reinforces the power of commercial publishers, simply maintains a status quo, and keeps the peer-reviewed 5000 word article as the primary focus of research that must be attained.

I’ve heard Stephen Downes say that as soon as any form of commercial enterprise touches education it ruins it (or words to that effect). I wouldn’t go that far, I think for instance that commercial companies often make a better job of software and technology than universities, but academic publishing is such an odd business that maybe it doesn’t make sense as a commercial enterprise. As David Wiley so nicely parodies in his trucker’s parable, there isn’t really another industry like it. Academics (paid by the taxpayer or students) provide free content, and then the same academics provide free services (editorship and peer-review) and then hand over rights and ownership to a commercial company, who provide a separate set of services, and then sell back the content to the same group of academics.

I know a few people who work in commercial publishing, and they are smart, good people who genuinely care about promoting knowledge and publishing as a practice. This is not a cry for such people to be out on the streets, but rather for their skills and experience to be employed by and for universities, the research communities and the taxpayer rather than for shareholders. In this business Downes’ contamination theory seems to hold, there is simply no space in the ecosystem for profit to exist, and when it does it corrupts the whole purpose of the enterprise, which is to share and disseminate knowledge.

Gold OA is not inherently detrimental. There are plenty of non-profit publishers who operate this model and they keep costs down to a minimum and have a generous fee waiver policy. They are, after all, not concerned with making a profit, and are concerned with knowledge dissemination. Other models exist also, including subsidised university presses, centralised publishing platforms, etc. The swindle is that there is no real incentive to explore these possibilities because the standard model has been reinforced through the manner in which OA has been implemented. As Tim O’Reilly comments “If we’re going to get science policy right, it’s really important for us to study the economic benefit of open access and not just accept the arguments of incumbents”.

[An earlier version of this post was originally posted on Martin Weller's blog]

Avatar of admin

by admin

Video: Julia Kloiber on Open Data

October 3, 2012 in Ideas and musings, Interviews, OKF Germany, OKFest, Our Work

Here’s Julia Kloiber from OKFN-DE’s Stadt-Land-Code project, talking at the OKFest about the need for more citizen apps in Germany, the need for greater openness, and how to persuade companies to open up.

Managing Expectations II: Open Data, Technology and Government 2.0 – What Should We, And Should We Not Expect

September 13, 2012 in Featured, Ideas and musings, Open Data, Policy

This is second of two pieces about “managing expectations” (the first is here). Open data has come a long way in the last few years and so have expectations. There’s a danger if open data is seen as a panacea that will magically solve climate change or eliminate corruption because it will inevitably fail to do so and hope and enthusiasm will be replaced by disappointment and dis-engagement.

This would be a tragedy as open data is valuable to us socially, commercially and culturally. However, we do need to think hard about how to make effective use of open data. Open data is usually only one part of a solution and we need to identify and work on the other key factors, such as institutions and tools, needed to bring about real change.

Some steps in a theory of change – see below for discussion. Note that the 3rd step is both by far the most important and most complex.

Government 2.0, Open Data and IT

More than two years ago a UK Government civil servant came to visit me. A new government had taken power in the UK for the first time in a decade and she wanted to ask to me about “Government 2.0″, open data and transparency.

One thing was immediately apparent from our discussion): while she was already excited by these ideas it wasn’t entirely clear to her what they involved or exactly what problems they would help with — something that has remained a common feature of conversations I’ve had since.

In my view (a view expressed in that conversation two years ago), there are (at least) two distinct — albeit related — ideas for what “Government 2.0″ means:

  • Improving (Government) services by utilizing current information technology and open data — open data being especially interesting (and novel) as it could turn Government from the direct supplier of services to the supplier of the data (and infrastructure) needed to run those services (‘Government as a Platform’)
  • More interactive, participatory governance (and therefore more “democratic) via the use, again, of open information and technology (though the connection was somewhat vaguer).

Put like this it’s clear why “Government 2.0″ can appear so exciting — after all it appears to promise a radical improvement, even transformation, of government.

But it also should make us concerned. Unrealistic expectations can be dangerous — something that is generally beneficial can get confused with a miracle cure and then blamed when it fails to deliver.

Moreover, there’s the risk that we start fixating on this wondrous new possibility (open data and technology) and ignore other key (but less exciting) elements in solving our problems — with the consequence that we much reduce the actual benefit we got from these new innovations in policy and technology.

This second point seemed especially important as it could lead to the dangerous assumption developing that open information + IT would magically turn into better (and more participatory) governance without much examination of how this exactly would come about and any changes to the form and structure of governance that would be needed.

The danger here is of confusing necessary with sufficient conditions: open data may be necessary part of better and more participatory governance but they are likely not sufficient without, say, substantial other changes in the structure and machinery of government (e.g. who gets to vote, when and where).1 These latter changes are normally costly and much more difficult than adopting new IT or opening up data. Thus, whilst new IT and open data are important, they are likely only one (possibly small) part of a solution.

When is Open Data (Part of) the Solution?

To help think about and clarify this question — of the role of open data and IT vis-a-vis other factors in a solution — I drew the first version of this diagram (a diagram I have drawn again and again over the last few years).

The purpose of the diagram is to provide a rough-and-ready way to think about the role of open data (and IT) in solving a specific problem compared to other factors such as institutions.

Some specific problems are listed for illustrative purposes. For example, Climate Change is situated up at the top-left implying that Open Data + IT likely play a relatively limited role compared to other factors such as institutions and governance change (roughly: the real problem here is reaching international agreement on a solution not more (open) information).

Conversely finding a better way to get to work is likely a problem where Open Data + IT can have a very large impact irrespective of any other factors. Meanwhile, for an issue like Corruption there would be debate as to where to situate it: on the one hand Transparency and Open Data can have a big impact with relatively little institutional or governance change, however, on the other hand one could argue that without reasonably significant governance and institutional change, open data and transparency would have little effect.

A Theory of Change

What this line of thought suggests is that we need to delve deeper into the exact “theory of change” for a given area. Around open data I think there is a general chain of logic, which runs, roughly, as follows:

  1. Open (digital) data + IT dramatically lowers the cost of access to information
  2. This includes information about what the government is doing (be that in terms of laws or filling in pot-holes)
  3. Armed with this better information citizens (or other groups) will

    1. Be able to hold government accountable and/or drive change
    2. Have a better sense of how their polity operates (improving trust etc)

In essence it presumes some theory of change like this (a diagram I also drew that day for the civil servant):

The key question is around step 3: “Action (& Change)”. It highlights the often missing (but implicit) assumption in much of this discussion that once information is available action and change will follow. But action, even in highly developed democracies can be hard for several reasons:

  1. Understanding and action requires attention: analyzing and acting on information requires time and attention and these seem to be (increasingly) scarce. Crudely put: do I go out to the cinema with my friends or do I read up on the latest draft law?

    The key cost of becoming politically active is not the direct cost of acquiring information (be that what is happening or the email address of the representative to contact) but the attention and time cost in analyzing, understanding and acting on that information. If so, open data and IT may only have a limited impact on reducing the cost of taking “political action”

  2. Digital technology by reducing simple transmission costs has substantially increased the amount of “information” competing for attention (information should also be interpreted here in the broad sense and include entertainment and anything else that could be shipped as digital “bits”)

  3. The problems of coordination and collective action: crudely, why should I bother to act if it requires a million of us to act for something to happen. Coordination problems are as old as humanity itself. While one can argue that modern communication technologies can assist us in coordination (cf the debate on the “social media revolutions”) they would seem at best to offer mild improvement on a fundamentally difficult problem (see the appendix below for more on this).

Conclusion

I should emphasize that I am far from arguing that open data (and IT) are not important. However, we need to temper our enthusiasm with an appreciation that they are only one part of the solution. As next steps I think we need to:

  • Think hard about what problems to tackle — if technology and open information are the tools to hand we want to focus on problems where they are especially effective. Using the first of the diagrams above can be a useful exercise in clarifying where a particular problem is located.
  • Be clear that other changes or improvements in, say, institutions, will be needed. We should work out what these are and then endeavour to make them. We should be aware that often these changes will be both more important and much harder than those that we can achieve with technology and open information alone.
  • Appreciate that open data and technology are attractive tools because they are (relatively) very cheap and straightforward to use. This is worth bearing in mind: even if open data and (information) technology are 10% of solving a problem they are an incredibly cheap 10% to do.
  • Acknowledge that open information and technology will often be complements to institutional change not substitutes. If so we cannot just do more open information and less governance reform — that would be likely giving you a second hammer to compensate for having no nails.

Appendix: Principal Agent Theory and Government

Imagine I hire a real-estate agent to sell my house for me. In legal/economic parlance I am the principal (I own the house) and the real-state agent is the “agent” — the person acting for the principal.

The normal “problem” of such relationships is that the interests and goals of the two parties are not aligned.

Take the real-estate case: suppose the two parties negotiate a 6% commission plus an up-front fee — then for every additional $1 in the sale price the agent manages to get from a buyer for the house the agent will receive only 6 cents in commission. This implies a strong divergence, at least in pure monetary terms, between the incentives of the principal (the owner of the house) and the agent (the real-estate agent).

To make this even more concrete, consider a situation where by working a weekend, and doing extra showings of the house it will be sold for an extra $10,000. Suppose that the agent crudely values this weekend time at $1000 a day (imagine their daughter has a birthday party!). For them the “payoff” is: $600 (6% of of $10k) – $2000 (their time cost) = -$1400.2 So they have a strong incentive not to bother. Meanwhile the principal would clearly make the effort: even assuming a higher cost of their time of $5k their “payoff” is $10k – $5k = $5k.

So how does this relate to government? Well, Lincoln may have said that Government is by, for and of the people but in truth only the middle of these is true: Government is a classic principal agent setup in which the principal (the people) appoint agents (their elected representatives) to govern the polity.

Thus, Government faces exactly the kind of principal agent problems above: the incentives of elected officials (or bureaucrats) may differ markedly from those of the citizens as a whole. For example, elected officials may primarily care about remaining in office (just as the real-estate agent care about commission) rather than ensuring the best outcome for citizens — this trade-off will be especially acute when narrow but powerful groups who are able to provide monetary or other support for, say, re-election, have interests that conflict with the general welfare of society.

The focus of most principal-agent analysis is on how to better align the interests of the agent with that of the principal. Normally this involves some form of monitoring (so the principal knows what the agent is doing) combined with some form of reward and sanctions based on what the agent’s actions.

In a perfect world, the principal would know exactly what the agent was doing and with the right set of rewards and sanctions could then ensure they did exactly what was wanted. However, in this situation the principal would essentially be the agent (how else does one know exactly what they are doing) and so the real question is how well can one with imperfect information and imperfect rewards and sanctions. We need not go into detail here but the key (and obvious) point is that the more imperfect the principal’s information and the more imperfect their rewards and sanctions and poorer the alignment will be.

Unfortunately on this basis there are several reasons to think the government / people (electorate) principal-agent problem is especially bad:

  • Government is complex – this makes it hard for the principal (the “people”) to know what the agent(s) are doing (remember it’s more about knowing the agents actions than outcomes since outcomes, due to uncertainty, only partially reflect an agent’s effort. Very strong incentives based on uncertain outcomes can be counter-productive — if I could work very hard and it will all be for nothing because things go badly for random reasons then maybe I should not bother and just see what random chance brings)
  • The incentives that can be offered to agents (the governors) are relatively crude — being voted out at the next election (or being overthrown in an uprising!)
  • Governance in fact has multiple levels of principal-agent relationships: the “people” may elect representatives who in turn appoint or utilize a managerial bureaucracy to run government — in this case elected officials are principals and the bureaucracy is agent.
  • The very large number of individual citizens makes the coordination problem of acting to sanction or reward an agent especially difficult — the simplest form of this sanctioning (rewarding) is in the form of elections yet the incentives for any given citizen to make the effort to participate is very low: why should I bother to vote when I am only one among millions? (we note that turnout in most countries has been consistently dropping).

  1. Of course, it is true that technology and the open flow of information can enable certain forms of governance that are otherwise very difficult — for example, modern IT makes it possible literally to hold daily votes of all citizens, something that would otherwise be impossible except in the very smallest of polities. 

  2. nb. i have made no allowance for risk aversion here given that gain is expected $10k. However, the basic point would still stand. 

Managing Expectations

July 24, 2012 in Ideas and musings, Open Content, Open Data, Our Work

We’re big on promoting open information: be that sonnets, statistics, genes or geodata. We’re big on it because we think it has the potential to improve the welfare of peoples around the world in a variety of ways, from making governments more accountable to improving research on cancer.

At the same time I think it is important that we, and others, are realistic about what will be achieved and on what time-scale. This can be a difficult thing to say. Often, to get people to travel with you, you have to sell them a grand vision of how whatever you are doing will revolutionize things overnight. But most changes, especially big ones, are more gradual.

Think of the celebrated invention of the printing press, today often compared to the invention of the computer. The printing press did, ultimately, produce a “revolution” and wrought plenty of change. But it didn’t happen overnight. What is more, the effect of the printing press on, say, the balance of political power or even more prosaic matters like literacy was by no means immediate. Change occurred over a period of decades or centuries and was often dependent on the evolution of a complex set of complementary institutions and technologies.

Printer_in_1568-ce

Similar patterns can be seen for another fundamental technological development: electricity. Legend has it that when Faraday first demonstrated an electric effect at the Royal Society in the 1830s, Gladstone questioned whether it was not just a scientific curiosity given its lack of obvious applications — to which Faraday famously replied: “What good is a baby?”. It took over a century for electricity to reach anything like its full potential.

Michael Faraday delivering a lecture in 1856

Today we find ourselves in a similar situation. Whilst we live in a much accelerated age compared to 15th century Germany or 19th century England, we probably still need to think on a time-scale of decades if we are going to see the full effects of the new open approaches to the creation and sharing of knowledge — approaches that we have only just begun to explore.

This is the first of two posts on this topic by Rufus Pollock, a Founder and Director of the Open Knowledge Foundation, and a Shuttleworth Foundation Fellow.

Science, data and the public

July 21, 2012 in Featured, Ideas and musings, Open Data, Open Science

Omnitruncated 120/600 Cell by Jonathan Gray (jwyg on Flickr)

Earlier this week the European Commission released a package of documents related to their nascent policies on access to scientific information. What will these mean for science and for public engagement with science?

New open access policies have been in the headlines quite a bit recently, as politicians and policy makers respond to the wave of public support precipitated by the so-called academic spring earlier this year.

On Monday the UK government announced that all its publicly funded research will be open access within two years (though not everyone is convinced about plans for how this will be achieved). Open access has received a more modest upwards bump up the political agenda in the US, with a meeting between Obama’s science advisor and prominent access advocates, and a flurry of support for a petition requesting an open access mandate for publicly funded research.

The European Commission has been broadly supportive of open access policies for some time. It is already piloting open access for 20% of the research it funds under the €50 billion Seventh Framework Programme (FP7), citing what it calls the “fifth freedom”, “the free circulation of researchers and scientific knowledge”. So what’s new?

For a start it is notable that the EC explicitly highlights open access to scientific research data as well as to scientific research publications. It explicitly highlights parallels between opening up publicly funded research data and opening up public sector data. And – interestingly – it explicitly mentions not only scientists and research institutions but also citizens as potential users of scientific data.

This is new. Our volunteer led Open Science Working Group at the Open Knowledge Foundation has been working with key stakeholders to promote open scientific data for a number of years – from policy initiatives like the Panton Principles and the Panton Fellowships, to the recently launched open source PyBossa crowdsourcing platform, developed in association with the Citizen Cyberscience Centre. As far as we know there has not been a comparable public policy development which offers such strong or explicit support for opening up scientific data.

The European Commission’s basic message is that – with limited exceptions such as privacy and third party rights – maximising reusability is the best way to maximise scientific innovation and return on investment. And the wording is reasonably strong. One document (PDF) says “information already paid for by the public purse should not be paid for again each time it is accessed or used”. Another (PDF) says “policies on open access to scientific research results should apply to all research that receives public funds”.

Concrete measures that will be taken to address this include working with member states to implement and strengthen open access policies for publicly funded publications and data, strengthening their own commitment to open access with research that they fund (including open access to all publications), and investing in infrastructure to support the reuse of scientific data.

There are clearly reasonably strong overlaps between the EC’s thinking on publicly funded scientific data and their thinking about public sector data. They want to create policies that will increase innovation by allowing more people to derive and create value from data – rather than letting it moulder on institutional hard drives, or sit behind paywalls. The logic in both cases is similar: unlock data, facilitate reuse, maximise impact and value to society.

While the benefits of open scientific data for scientists and research institutions are reasonably well documented – the Human Genome Project is probably the best known exemplar – one wonders what innovations we might see from non-experts and non-scientists, and what more open policies might mean for the public understanding of science.

You don’t have to accept that anyone can be a scientist without prior training to see the value of citizen science projects like Galaxy Zoo and EyeWire, which harness input from users to complete simple tasks. The Clearer Climate Code project is an amazing example of scientific innovation by non-scientists as a direct result of open scientific data – resulting in NASA preferring algorithms written by a couple of dedicated volunteers to their own code.

Imagine we could leverage input from bright and committed members of the public to increase the pace of scientific development in areas of pressing concern – from climate change to cancer. The new measures proposed by the EC are a small step towards enabling this to happen. Let’s hope that EU member states and other countries will follow suit, and recognise that a world in which scientific data is open by default is better than one in which it is closed.

NB: This article initially appeared on the Guardian’s Data Blog

Why Open Data isn’t enough.

May 28, 2012 in Featured, Ideas and musings, Open Data

The debate around data in our community has been densely concentrated around the question of openness. That’s not surprising. Words like “free” and “open” have dominated the conversations in the digital commons for most of its existence, mainly because most of the digital commons has been centered on copyrightable works.

Software, text, photos, videos, music are all creative works under the law, all carrying the powerful, relatively internationalized protections of copyright, and this very power allows creators to invert that power using free / libre copyright licenses. That reality has led to a set of definitions of freedom for software, for cultural works, and for knowledge, all of which are very centered on the intellectual property regimes surrounding digital objects. We’ve also propagated the idea to hardware.

And that’s carried over into the debate around data. We ask, “is it open data?” of the world.

But I spend a lot of time around data people for whom open is an afterthought. For many people it’s Big Data, right down to the requisite O’Reilly branded events. They’re worried about whether we should leverage machine learning or domain experts, not openness. Or it’s Social Data. They’re worried about privacy policies and selling the data to as many vendors as possible. It’s Blue Button, and Green Button. They’re worried about getting data into people’s hands. It’s Quantified Self. They’re worried about getting their own data into their own hands. In Washington and other capitals large and small it’s on Government Data.

Open is almost never mentioned.

And I think that’s because we’re so focused on intellectual property, on share alike and attribution and public domain, that we lose the bigger context.

Creative works came online in a cultural and technical context that allowed us to focus on freedom, and intellectual property. We have decades of history with software, photo, and video, and hundreds with text. We had a technical infrastructure ready to create, distribute, consume, and remix creative works: mailing lists, sharing websites, wiki software.

We don’t have that with data.

Data is entering the world at a rate that is so fast it’s almost incomprehensible to human brains. It’s like trying to comprehend geologic time. The cost of generating data is so low in so many spaces, and dropping like a stone in so many others, that the real challenge is to do interesting things with it. The gulf between those who can do something with data and those who can’t is a serious new case of digital divide, and licensing is just a tiny part of that gulf. Important, to be sure, but tiny.

There’s a people gulf – 190,000 machine learning experts and 1,500,000 managers in the US alone that don’t exist, but need to, to take advantage of data. That gap is worse in the developing world, and will only accelerate in coming years.

But perhaps most important is a cultural gulf – we live in a world right now that (implicitly in most cases, but increasingly explicitly) accepts the natural state of data as transactional. We trade our data, rather than our cash, for services like Facebook, Google, apps, and more. We don’t get a copy of it. We don’t know who does. We’re on the outside of the black box, but our data’s on the inside.

So my argument is that we as an “open” movement need to understand and integrate our concerns over property rights into the broader debate. We need to talk about citizen’s rights. We need to talk about the right to understand how our web searches are returned. We need to talk about how our privacy rights may be negatively impacted by more openness.

Because unlike the web, and the internet, which grew quietly in obscure corners of the world, allowing open designs to flourish, data has already drawn attention, money, and closed business models. We’re in active competition against powerful, rich opponents to create an open ecosystem at the core of data, one that TCP/IP and HTML didn’t have to fight.

Here’s hoping we can bridge the gaps before other, closed systems can do so for us. The good news is that open systems have a lovely little history of outcompeting closed ones, given time, freedom to compete, and even a small group of committed people.

Talk at LIFT 2012: Open Data – How We Got Here, and Where We’re Going

April 2, 2012 in Featured, Ideas and musings, Open Data, Our Work, Talks

I’m pleased to announce that the video of my talk, Open Data: How We Got Here, and Where We’re Going, that I gave a few weeks ago at the LIFT 2012 conference has now been published:

Over the past few years, there has an explosive growth in open data with significant uptake in government, research and elsewhere. Open data has the potential to transform society, government and the economy, from how we travel to work to how we decide to vote. But we have only just begun down this road, and the going, even so far, has not always been easy.

My talk introduced the idea of open data, explaining how, and why, we are where we are today, and, finally, took a look to the future of the rapidly evolving open data ecoystem.

Slides from the talk – Link to full version

From CMS to DMS: C is for Content, D is for Data

March 9, 2012 in Featured, Ideas and musings, Open Standards

This is a joint blog post by Francis Irving, CEO of ScraperWiki, and Rufus Pollock, Founder of the Open Knowledge Foundation. It’s being cross-posted to both blogs.

Content Management Systems, remember those?

Tim Berners-Lee in thought

It’s 1994. You haven’t heard of the World Wide Web yet.

Your brother goes to a top university. He once overheard some geeks in the computer room making a ‘web site’ consisting of a photo tour of their shared house. He thought it was stupid, Usenet is so much better.

The question – in 1994 did you understand what a Content Management System (CMS) was?

In the intervening years, CMS’s have gone through ups and downs.

Building massive businesses, crashing in the .com collapse. Then a glut, web design agencies all building their own CMS in the early noughties. Ending up with the situation now.

A mature market, commoditised by open source WordPress. Anyone can get a page on the web using Facebook. There’s still room for expensive, proprietary players, newspapers custom make their own, and businesses have fancy intranets.

Data Management Systems, time to meet them!

DMSs are also called "data hubs". Hopefully less patented than this wheel!

It’s 2012. You’ve just about heard of Open Data.

Your nephew researches the Internet at a top university. He says there’s no future in Open Data, no communities have formed round it. Companies aren’t publishing much data yet, and Governments the wrong data reluctantly.

The question – what is a Data Management System (DMS)?

There isn’t a very good one yet. We’re at round about where CMS’s were in the mid 1990s. Most people get by fine without them.

Just as then we wrote HTML in text files by hand and uploaded it by FTP, now we analyse data on our laptops using Excel, and share it with friends by emailing CSV files.

But it reaches the point where using the filesystem and Outlook as your DMS stretches to breaking point. You’ll need a proper one.

Nobody really knows what a proper one will look like yet. We’re all working on it. But we do know what it will enable.

What must a DMS do?

All the things people expect a DMS to do!

A mature DMS will let people do all the following things. Whether as a proprietary monolith, or by slick integration across the web:

  • Load and update data from any source (ETL)
  • Store datasets and index them for querying
  • View, analyse and update data in a tabular interface (spreadsheet)
  • Visualise data, for example with charts or maps
  • Analyse data, for example with statistics and machine learning
  • Organise many people to enter or correct data (crowd-sourcing)
  • Measure and ensure the quality of data, and its provenance
  • Permissions; data can be open, private or shared
  • Find datasets, and organise them to help others find them
  • Sell data, sharing processing costs between users

If it sounds like a fat list for a product, that’s because it is. But sometimes the need, the market, pulls you – something simple just won’t do. It has to do or enable, best it can, everything above. (Compare it to the same list for CMSs)

In short, it’s what the elite data wrangling teams inside places like Wolfram Alpha and Google’s Metaweb teams do. But made easier and more visible using standardised tools and protocols.

Who’s making a DMS?

More people than I realise. From the largest IT company to the tiniest startup. Here are some I know about, mention more in the comments:

  • Windows / OSX (+ Excel / LibreOffice / …) – the desktop serves as a (good enough so far) DMS
  • CKAN software – started as a data catalog, but has grown into more and powers the DataHub, a community data hub and market. Created by the Open Knowledge Foundation
  • ScraperWiki- coming from the viewpoint of a programmer, good at ETL
  • Infochimps/DataMarket – approaching it as a data marketplace
  • BuzzData – specialising in the social aspects
  • Tableau Public – specialising in visualisation
  • Google Spreadsheets – coming from the web spreadsheet direction
  • Microsoft Data Hub – corporate information management
  • PANDA – making a DMS for newsrooms

They’re all DMS’s because they all naturally grow bad versions of each other’s features. Two examples.

ScraperWiki is particularly good at complex ETL (loading data into a system), yet every DMS has to have a data ingestion interface of at least choosing CSV columns.

CKAN has particularly good metadata, usage and provenance, yet every DMS has to have a way for people to find the data stored in it.

So will they be giant monolithic bits of software?

We standardised the shipping container, can we standardise data interoperation?

We hope not! That didn’t turn out great for CMSs, although there are some businesses providing that.

CMS’s only really came of age when in the mid-noughties everyone realised that WordPress (open source blogging software!) was a better CMS than most CMS’s.

It’s in everyone’s interest that users aren’t locked into one DMS. One of them might have a whizzy content analysis tool that somebody who has data in another DMS wants to use. They should be able to, and easily.

OKFN is about to launch a standards initiative to bring together such things. It’s called Data Protocols.

So far the clearest needs are twofold and mirror each other – pulling and pushing data:

a) a data query protocol/format to allow realtime querying, for example for exploring data. Imagine a Google Refine instance live querying a large dataset on OKFN’s the Data Hub.

b) a data sync protocol/format that is liken to CouchDB’s protocol. It would let datasets get updated in real time across the web. Imagine a set of scrapers on ScraperWiki automatically updating a visualisation on Many Eyes as the data changed.

Later even more imaginative things… I reckon Google’s Web Intents can be used to make the whole experience of the user slick when using multiple DMS’s at once. And hopefully somebody, somewhere is making a simplified version of SPARQL/RDF just as XML simplified SGML and then really took off.

Enough of me! What do you think?

Join in. Make standards. Write code.

Leave a comment below, and join the data protocols list.

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates