Jordan Hatcher talk on Open Data Licensing at iSemantics
September 6th, 2010
Last week, the Foundation’s legal expert Jordan Hatcher, was at iSemantic conference in Graz to give a session on open data licensing (especially for linked data). Here are the slides:
Data Journalism Meetup, Berlin, 1st September 2010
August 20th, 2010
We’re delighted to announce a meetup on Data Journalism in Berlin in September organised by the Open Knowledge Foundation and Georgi Kobilarov at Uberblic Labs. Details are as follows:
- When? 1st September 2010
- Where? Fjord Office, Friedrichstrasse 210, Berlin
- Register? You can register here!
Speakers will include:
- Martin Belam, The Guardian
- Jonathan Gray, The Open Knowledge Foundation
- Christian Heise, ZEIT Online
- Gerd Kamp, Deutsche Presse Agentur
- Georgi Kobilarov, Uberblic Labs
- John O’Donovan, BBC News
- Tom Scott, BBC Earth
- Ole Wintermann, Bertelsmann Foundation
From the blurb:
Data Journalism and the new and exciting possibilities that the Web of Data opens up for creators and consumers of news and media online will be the topic of this first meetup.
We have a brilliant lineup of speakers from media organisations like the BBC, The Guardian, the Deutsche Presse Agentur, the Bertelsmann Foundation coming to Berlin and talking about data journalism and the latest developments and projects in this field, and our friends from ZEIT Online will join the discussion.
The event takes place at the office of our friends at Fjord in the heart of Berlin. Starting at 2pm, you’ll hear talks followed by a panel discussion and an open space for working groups, and when the official programme ends at 7pm we’ll of course have drinks with all of you.
Language of all talks at the event will be English, but don’t be surprised to hear a bit of German here and there in conversations.
Open Government Data Camp 2010, 18-19th November 2010
August 13th, 2010
The Open Knowledge Foundation is organising an international workshop on open government data, which will take place in London this autumn:
You can register at:
From the announcement:

What is it?
Basic details are as follows:
- What? A two day workshop for people interested in open government data.
- When? 18-19th November 2010
- Where? University of London Union, London, UK
- How much? Tickets cost £10 to help cover costs. You can sign up here!
- Hashtag? ##ogdcamp2010
Tell me more…
Its been a big year for open government data. Around the world governments and public bodies have been opening up official datasets for the public to reuse. There has been an explosion of new applications, competitions, hackdays and other initiatives from local authorities, central government departments, international bodies and others. This event will bring together movers and shakers from the world of open government data — including government representatives, policymakers, lawyers, technologists, academics, advocates, citizens, journalists and reusers.
What will happen?
There will be two days of discussions, drafting, planning and hacking. Crucially we hope to:
- Build consensus around key legal, technical and policy issues related to opening up government information.
- Strengthen the community of people working on different aspects of opening up official data around the world — from both inside and outside government. (Many people working on this area will not have met in person!)
- Encourage the exchange of experiences, expertise and ideas between those involved in leading open government data initiatives in different countries.
- Make things! We hope there will be plenty of space for developers to hack on things — from refining core bits and pieces of technology to rapid prototyping of new ideas.
What will the format be?
Presentations will be kept to a minimum. Each day will begin with a sprinkling of short talks followed by plenty of time to talk, plan and work on things.
Can I submit a presentation?
We are going to put out a call for short presentations (around 30 x 10 minute slots) shortly. Details/links will be posted on the open-government discussion list.
Can I propose a session?
Yes please! Again, we’re going to brainstorm, plan and schedule sessions on the open-government discussion list — so head there if you have any cunning ideas!
What kinds of topics will be covered?
Possible sessions include:
- How can we encourage other countries to open up official information?
- Open government data in law and policy: obstacles and opportunities
- Promoting reuse: competitions, community engagement, the role of the media
- Finding open government data: catalogues, registries and metadata
- Raw Data Now! Technical aspects of opening up government data
- The role and value of linked data
- Open government data and data journalism
What kinds of outputs will there be?
Projected outputs include things like:
- First draft of an international ‘open data manual’ (organised as a ‘Book Sprint’)
- A set of key open government data principles
- A timeline of key developments for open government data around the world
- A fairly comprehensive list of official initiatives — including data catalogues and competitions
- A list of key examples of the reuse of open government data
- Launch of RawDataNow.com — illustrating what we mean by ‘raw data’ aimed at those who publish official information
- Brainstorming about projects which would make it easier for citizens to find, analyse and visually represent the data they are looking for
Who’s behind the event?
Open Government Data Camp was conceived and is being primarily organised by the Open Knowledge Foundation. The event is also supported by:
- Cabinet Office, UK
- EU LAPSI project, Turin, Italy
- EU LOD2 project, Leipzig, Germany
- Guardian, UK
- Sunlight Foundation, USA
Who is coming?
You can find a list of participants at:
If you add your name to the list, please don’t forget to register! (And vice versa: if you’ve registered, please also add your name to the pad page above…)
Can I sponsor the event?
Yes please! We are still actively seeking sponsorship for lunches, coffee, travel and accommodation for international participants and so on. If you think you might be interested, please contact jonathan dot gray at okfn dot org.
What countries will be represented?
We are currently expecting representation from:
- Argentina
- Australia
- Austria
- Belgium
- Brazil
- Canada
- Denmark
- Finland
- France
- Germany
- Hungary
- Iceland
- India
- Ireland
- Italy
- Luxembourg
- Netherlands
- New Zealand
- Norway
- Russia
- Spain
- Sweden
- Taiwan
- United Kingdom
- United States
Why do I have to pay?
The £10 ticket price is to help cover costs. If the ticket price is a problem, don’t hesitate to let us know. We won’t turn anyone away because they can’t afford to come!
A few weeks back we blogged about Russ Nelson’s proposals for the Open Source Initiative (OSI) to adopt the Open Knowledge Definition, our standard for openness in relation to content and data.
Russ has written back to us with some notes and questions from a session on this at OSCON:
Okay, so, as promised, here is my report on the “Open Data Definition” BOF held on Wednesday, July 21, at 7PM. There were about ten people present, which is a reasonable attendance, particularly when set against the Google Android Hands-on session at which they gave out free Nexus One phones.
Didn’t seem wise to me to start from scratch, especially given the good work done by the Open Knowledge Foundation on their Open Knowledge Definition: http://www.opendefinition.org/okd/. So we read through it section by section, by way of review. Here are the questions we arrived at (thanks to Skud aka Kirrily Robert for taking notes):
- What happens with data that’s not copyrightable? 1a. What about data that consists of facts about the world and thus even a collection of it cannot be copyrighted, but the exact file format can be copyrighted? Many sub-federal-level governments in the US have to publish facts on demand but claim a copyright on the formatting.
- What about data that’s not accessible as a whole, but only through an API?
- We’re thinking that OKD #9 should read “execution of an additional agreement” rather than “additional license”.
- Does OKD #4 apply to works distributed in a particular file format? Is a movie not open data if it’s encoded in a patent-encumbered codec? Does it become open data if it’s re-encoded?
- What constitutes onerous attribution in OKD #5? If you get open data from somebody, and they have an attribution page, is it sufficient for you to comply with the attribution requirement if you point to the attribution page?
This serves as an invitation to discuss these issues on the new list open-data@opensource.org . Send subscription requests to open-data-subscribe@opensource.org . Unsubscribe by sending a request to open-data-unsubscribe@opensource.org .
If these issues are successfully resolved, then this committee will recommend to the OSI board that the OKD should be adopted as OSI approved. If they can’t be resolved by, say, the end of 2010, then we will give up on trying. Either way, the intent is to lay down the list by the end of this year unless the participants desire otherwise.
So if you’d like to join the conversation, please join the list! We’ve also created an Etherpad to gather responses to some of these issues:
Introducing the Panton Papers
July 26th, 2010
Peter Murray-Rust — Cambridge University chemist, Open Knowledge Foundation Advisory Board member and tireless advocate for open data in chemistry — has recently started a series of blog posts about open data, focusing on issues related to the Panton Principles for open data in science.
The first is called Open Data: why I need the Open Knowledge Foundation, and in it he introduces some of the issues he wishes to discuss and gives his vision for the role he hopes the OKF community will play in relation to open data. He writes:
After a period of silence on this blog (but not on the Open Knowledge Foundation lists) I hope to publish a flurry of ideas on Open Data. There is no doubt that “Open Data” has arrived and there is enormous interest. (By contrast when I started to investigate it 5 years ago there was nothing). It’s desperately important, more complex than I ever imagined, and it’s critical to address it immediately, responsibly, dispassionately and inclusively. If we manage to set out the concerns now, we may manage to avoid the worst problems that were encountered by the Open Source and later Open Access movements. [They have made enormous progress and without their footsteps Open Data would fall into many of the same pitfalls. But Open Data is Difficult – a phrase I shall repeat frequently.]
I am putting my faith and energy into the Open Knowledge Foundation – its people and its infrastructure. This is because it’s an organisation which is wideranging (it deals with open content of all sorts, open metadata, services, etc.). It has great expertise in legal problems and solutions (where these are necessary) and also how to find alternative approaches. It’s neutral (apart from urging Openness and developing the infrastructure). It’s very professional, and realises that ideas without implementation have less weight. So there is an impressive range of software and information skills. I am reminded of my favourite motto (from the IETF) – “rough consensus and running code”, one the greatest productive mantras of our time.
The enthusiasm is palpable. [Today I had a breakfast Skype session with Jonathan Gray (coordinator of OKF) and it's all about how we can make things happen fast and responsibly.] The OKF works through Working Groups and discussion lists, and so when I had a concern about Open Data I brought it to the OKF and – after a great deal of work – we emerged with the Panton Principles which have now been translated into several languages by OKF members.
Simply, the OKF amplifies the visions of individuals from the almost-impossible to the attainable.
So I am putting some ideas into the OKF melting pot to see what emerges.
In the next post, titled Open Data: The concept of Panton Papers, he lays out his ideas for the Panton Papers:
The current theme is “Panton Papers”. The idea is that part of the value of the Panton Principles is that the whole document is short and the key points are simply made. But the “Principles” can therefore only address the motivation and the procedures for Open data in a general manner, and many of the problems are in the details. I believe that many of the problems in Open Access (which is simpler than Open Data) arose because not enough communal effort was given to the practice of Open Access and I want to avoid as many OD problems as possible before they occur.
Over the last 2 years (when Open Data has started to become important and discussed) I have seen several potentially difficult areas. I’ll simply list the ones I have thought of here and then outline the idea of the Panton Papers. This discussion is mirrored in part by the OKF open-science discussion list and you may wish to subscribe. There’s also a regular working group on open-science. (Almost everything in OKF is Open, but it may take a little while to find out where you want to be!). The issues that I currently have are:
- What is data? Images? Graphs? Tables? Equations? Accounts of experiments? This is a major problem and almost completely unexplored. Without solving this we are held back 10 year or more in our ability to re-use the primary scientific literature (e.g. by closed-access publishers who claim that factual graphs belong to them).
- Why should data be open? (and when should it not be?). I’ve put forward ideas here and here . They range from moral, to legal/quasi-legal to utilitarian.
- Who owns data? This is one of the trickiest areas – there is legal and contractual ownership and there is moral ownership. Generally there is far far too much “ownership” of data.
- When should data be released? This is a key question (see here for an example). Some communities have solved it – most haven’t addressed it and will have to go through the rigour of working out release protocols.
- How and where should data be exposed? I am strongly of the opinion that we need domain-specific repositories (which could be national or international) and the Institutional Repositories are almost never the best place to expose data (I expect and welcome alternative opinions). The “how” depends on understanding what the data and metadata are and is increasingly dependent on specialist software and information standards. “Archival” is often the wrong word to use.
- Datamining and textmining. Most authors, publishers, repository owners are unaware of the enormous power of automated analysis of the literature. Some closed access publishers expressly forbid these activities. We have to liberate the right of the scientific community to do this enthusiastically and efficiently.
- Reproducibility. Science is based on reproducibility – we expect to be able to replicate the “materials and methods” of an experiment and to try to falsify its claims. Physical materials are beyond the immediate discussion (though this may change) but much science is now based on computing. It should be possible to replicate simulations, data cleaning, data analysis, model fitting etc. This is a tricky area. It is difficult (though with virtualization and the cloud is becoming easier) to reproduce the computing environment. Large or complex data sets are a major problem but must be addressed. This is not without monetary cost.
I may add more.
The idea is that each of these is a “Panton Paper”. It may or may not be crafted in Pantonia (the hectare of the Chemistry Department, The OKF headquarters, and the Panton Arms in Cambridge UK). Everything I now write is mutable.
Each paper will have a top level document of similar form to the Panton Principles, i.e. 3-8 ideas, with short explanatory paragraph(s). This document will be crafted by the OKF in public view on a wiki or Ether/Piratepad. Anyone can take part. We shall welcome contributions from a wide range of disciplines (in fact this is essential). At some stage version 1.0 of the paper will be frozen and will be formally published. We have an offer from a major publisher to do this and I am hoping we can announce this at Open Science Summit.
The Paper should carry a wider range of links to other essays in Open Data and should carry examples from different disciplines. For example there is a well tried and accepted process in many areas of bioscience and astronomy as to what when and how data get published.
Peter has started drafting ideas for the first two of these at:
If you’d like to get stuck in, please head on over to the open-science list and say hi!
Russ Nelson, License Approval Chair at the Open Source Initiative (OSI), recently proposed a session at OSCON about OSI adopting a definition for open data:
I’m running a BOF at OSCON on Wednesday night July 21st at 7PM, with the declared purpose of adopting an Open Source Definition for Open Data. Safe enough to say that the OSD has been quite successful in laying out a set of criteria for what is, and what is not, Open Source. We should adopt a definition Open Data, even if it means merely endorsing an existing one. Will you join me there?
Subsequently a bunch of people wrote to Russell letting him know about the Open Knowledge Definition that we created a few years ago:
The Open Knowledge Definition (OKD) sets out principles to define ‘openness’ in knowledge – that’s any kind of content or data ‘from sonnets to statistics, genes to geodata’. The definition can be summed up in the statement that “A piece of knowledge is open if you are free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.”
Russell suggested there was scope for the OSI to adopt the OKD, and emailed us a further blurb for the event:
Should the Open Source Initiative write its own definition of Open Data? Or is the Open Knowledge Foundation’s definition up to snuff? Come help us decide at OSCON next week. We have a BOF scheduled at 19:00 on 21 July 2010. We’ll present the results of our decision to the OSI for adoption at its next board meeting.
We’re excited at the prospect that the OKD might get adopted as an official open data definition by OSI, and would love to hear from folks who plan to attend the session!
Why Share-Alike Licenses are Open but Non-Commercial Ones Aren’t
June 24th, 2010
It is sometimes suggested that there isn’t a real difference in terms of “openness” between share-alike (SA) and non-commercial (NC) clauses — both being some restriction on what the user of that material can do, and, as such, a step away from openness.
This is not true. A meaningful distinction can be drawn between share-alike and non-commercial clauses (or any other clause that discriminates against a particular type of person or field of endeavour), with the former being “open” and the latter being not “open”.
This distinction is important. It has relevance, for example, as to why Open Data Commons should not provide NC licenses but will provide a share-alike one. As well as to Creative Commons whose set of licenses includes both share-alike and non-commercial options. As such, not all CC licenses are open and CC licenses are are not all mutually compatible. This is something of an irony as it means that Creative Commons provide a set of licenses that don’t, in fact, result in a commons.
What’s the Problem? Why Does This Matter?
What’s the problem with NC licenses, aren’t “SA” licenses a step away from open too? And if we debate this, don’t we just end up having a pointless license holy war?
The distinction between NC and SA licenses isn’t about “holy war” but something very practical: license compatibility and the integrity of the “open” commons. The core of a “commons” of data (or code) is that one piece of “open” material contained therein can be freely intermixed with other “open” material.
This interoperability is absolutely key to realizing the main practical benefits of “openness” which is the ease of use and reuse — which, in turn, mean more and better stuff getting created and used.
The Open Knowledge/Data Definition functions as a “standard” to ensure interoperability just in the same way as normal tech standards operate (but in this case for licenses rather than for a piece of hardware or software). The aim is to ensure that any license which complies with the definition will be interoperable with any other such license meaning that data or content under the one license can be combined with data or content under the other license.
Share-alike or attribution requirements are allowed within the definition precisely because they do not break this interoperability (and may even help promote the commons by ensuring material is “shared back”). Non-commercial provisions are not permitted because they fundamentally break the commons, not only through being incompatible with other licenses but because they overtly discriminate against particular types of users. (I should emphasize here that the definition is directly following the line set out in the original open source definition …)
Thus, there is a meaningful distinction between attribution and share-alike requirements and other such as non-commercial (NC), and it is a distinction that merits the description of share-alike licenses as being open but non-commercial licenses as not being open.
Isn’t It Just About Degree?
Yes, NC and especially ND are more restrictive, but stating that NC licenses aren’t open is wrong - they’re just not as open.
This is incorrect.
To reiterate: it is a mistake to view the set of licenses as some continuous spectrum of ‘openness’ with PD at one end and full rights reserved at the other — with the implication that all licenses in between are more or less open.
There are significant discontinuities and in particular we can meaningfully partition the set of licenses into open and not-open based on a) their interoperability b) the freedom they provide to all persons (and companies) to use, reuse and redistribute.
But You Can’t Trademark Openness …
it’s annoying that someone claims to be releasing data openly, but it turns out to be NC and no-compete and a bunch of other stuff. It would be nice to say to them - “you can’t claim to be open because you don’t meet this definition”. But unfortunately it would probably be difficult to get the trademark on the word “open”
It’s quite right that you can’t trademark openness — and no-one should want to! However, we can make an effort as a community to have a clear shared meaning for “open” in relation to data and content along the lines of http://opendefinition.org/ — just as the open source definition has done for code. By insisting on this meaning we are doing something valuable: creating a standard and maintaining interoperability.
Emergency Budget, Deficit and Cuts: Visualized
June 22nd, 2010
Today in the UK the Conservatives/Liberal Democrat coalition presented their Emergency Budget.
Collaborating with David McCandless, Where Does My Money Go? have created a simple visualization to help you understand and contextualise the budget, and answer some basic questions such as: How much impact will the emergency budget have on the £156bn budget deficit? And what will those mind-boggling billion pound amounts actually mean?
Embed
Want to use this graphic in your own site or in the news? We’re happy for you to do so as long as long as you explicitly credit us and have a link back to this url. Here’s an html code snippet to do this:
Want a higher-res version, e.g. for print? You can get it here: http://static.wheredoesmymoneygo.org/i/deficit_budget_print.pdf
Credits
A Where Does My Money Go? visualization by David McCandless / InformationIsBeautiful, research by Lisa Evans and Tim Hubbard using on information from the Institute for Fiscal Studies and HM Treasury.
UK Government commits to open up new spending data!
June 2nd, 2010
It’s exciting times right now for people in the UK interested in how public funds are being used. The new government has proposed to publish unprecedented amounts of spending data in unprecedented detail. In the new Coalition Programme for Government (PDF), the PM has committed to the following, which is very similar to the Conservative pre-election promises but with more detail and — crucially — a schedule!
Local government spending transparency
- Historic COINS spending data to be published online in June 2010.
- All new central government ICT (information and communication technologies) contracts to be published online from July 2010.
- All new central government lender documents for contracts over £10,000 to be published on a single website from September 2010, with this information to be made available to the public free of charge.
- New items of central government spending over £25,000 to be published online from November 2010.
- All new central government contracts to be published in full from January 2011.
- Full information on all DFID international development projects over £500 to be published online from January 2011, including financial information and project documentation.
Other key government datasets
- New items of local government spending over £500 to be published on a council-by-council basis from January 2011.
- New local government contracts and tender documents for expenditure over £500 to be published in full from January 2011.
- Crime data to be published at a level that allows the public to see what is happening on their streets from January 2011.
- Names, grades, job titles and annual pay rates for most Senior Civil Servants with salaries above £150,000 to be published in June 2010.
- Names, grades, job titles and annual pay rates for most Senior Civil Servants and NDPB officials with salaries higher than the lowest permissible in Pay Band 1 of the Senior Civil Service pay scale to be published from September 2010.
- Organograms for central government departments and agencies that include all staff positions to be published in a common format from October 2010.
This is all great news for the Open Knowledge Foundation’s Where Does My Money Go? project. In particular we have been researching the COINS database as a rich source of data to visualise. In addition, it is noted that the current standard for reporting central government spending(PDF) is items above £20m in any year by region, so the £25,000 standard seems like a big improvement, hopefully this will also be spending by region.
With the UK election over, reductions in public spending are currently at the top of the agenda. Whichever way you cut it, taxpayers and public service users look set to face big changes. The ‘Where Does My Money Go?’ dashboard - a free, interactive online tool from the Open Knowledge Foundation - will help to make sense of the £6 billion of spending cuts to be announced on Monday.
The project allows the public to explore data on UK public spending over the past 6 years, in an intuitive way using maps, timelines and graphs. The latest release includes:
- A new mini-app called ‘Where are the cuts?‘ which will capture and visualise spending cuts as they happen.
- A new dashboard for visualising and exploring spending by region, type or over time - breaking down the jargon to make it easier to understand official spending categories.
- A new Where Does My Money Go? data store. This houses all the cleaned-up, nicely formatted data, sourced from many different government departments, and makes it available both via the web and and an API, enabling others to reuse, investigate and re-present the data.

In addition to new information about the spending cuts, the Where Does My Money Go? project plans to represent detailed information from the COINS database, the ‘holy grail’ of spending data, which George Osborne committed to publishing shortly after this election.
Dr Rufus Pollock, Economist from the University of Cambridge and Director of the Open Knowledge Foundation, comments:
It is crucial that the public are able to understand how they will be affected by the cuts to be announced on Monday - which depends on having a ‘bigger picture’ of where spending currently goes. We will be working hard to show the implications of spending cuts as they are announced and to track speculation about where cuts will be made in the future. Our project aims to close the loop between public information on spending and the public.

