Open Data Commons - Attribution License released
June 25th, 2010
Open Data Commons has released a new Open Data Commons attribution license (ODC-By). Jordan Hatcher, Chair of the Open Data Commons Advisory Council, writes:
Thanks to everyone for their feedback on the licenses and their help with the project. We can now announce a new license to the Open Data Commons family, the ODC Attribution License (ODC-BY) license. This is a database specific license requiring attribution for databases. This makes ODC-BY similar to the Creative Commons Attribution license, but is built specifically for databases. As a legal tool that only requires attribution, it complies with the Open Knowledge Definition, the Open Knowledge Foundation’s standard around defining the rights behind what something means to be “open”.
ODC-BY homepage at:
http://www.opendatacommons.org/licenses/by/
Plain language summary of the ODC-BY is up at:
http://www.opendatacommons.org/licenses/by/summary/
Final license text at:
http://www.opendatacommons.org/licenses/by/1.0/
For those preferring plain text:
http://www.opendatacommons.org/wp-content/uploads/2010/01/odc_by_1.0_public_text.txt
Thanks for everyone’s help, particularly Rufus and the ODC advisory board.
Comments on the Panton Principles and Data Licensing
March 25th, 2010
These comments were originally written a few weeks ago as part of an interesting thread on John Dupuis’ blog post about the Panton Principles.
What’s “Open” and Why Do the Panton Principles Recommend PD-only
The Open Knowledge Foundation’s general position is one of supporting open data where “open” data includes data made available under licenses with attribution and share-alike clauses, though non-commercial restrictions are definitely not permitted (see http://www.opendefinition.org/ for precise details). The reason for excluding non-commercial is simple: share-alike is compatible with a commons open to everyone but non-commercial is not.
Panton Principles 1-3 are, in essence, saying make data “open” in the sense of http://www.opendefinition.org/. Principle 4 goes beyond this to specifically recommend public-domain only for data related to published science, especially where the work is publicly funded.
The rationale for this “stronger” position, at least for me, was that a) science has existing (very) strong norms for attribution (and, to a lesser extent, share-alike) b) science has strong up-front funding support from society which reduces some of the risks that share-alike addresses.
That said, I should emphasize that, in my view at least, the key feature is that the data be made open — public domain dedication/licensing is “strongly recommended” but if you end up with an attribution or even share-alike type license that is still far, far better than not making the data available at all, or licensing it under non-commercial or other conditions.
Attribution Stacking and Copyleft (Share-Alike)
I remain completely unconvinced by the attribution stacking argument against attribution requirements in licenses, and I find its logic in the area of science and the PP rather incoherent: we expect attribution to happen even with PD since it’s part of the community norms in science. As such attribution stacking happens with or without a license — unless attribution actually won’t be happening which is a serious issue ….
I’m also unclear why copyleft does not work for DBs. Using CC sharealike license for DBs isn’t a good idea but there are other licenses such as the Open Database License (ODbL).
For more detail see earlier posts such as: http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/ and Lhttp://blog.okfn.org/2009/02/09/comments-on-the-science-commons-protocol-for-implementing-open-access-data/>
Contract and the ODbL
Cameron Neylon in his comments wrote:
GPL/CC licences do not work for data across jurisdictions. They rely on copyright. Data in most places cannot be copyrighted. Where it can is inconsistent. Whatever else you do don’t use copyright licences on data because they will scare off the good guys and the bad guys will simply ignore them because they are un-enforceable. You can in principle use contract law to create similar restrictions (and the ODbL does this) but you need to ask yourself whether you want to bring contract law into this space. The consequences might not be what you want.
I think this is a bit of a misconception on several levels. In particular, the contract point about the ODbL is, in my view, very minor and is turning into a bit of FUD so I should correct it.
In my view, the main “enforcement” mechanism of the SA conditions in the ODbL remains existing IP rights whether copyright or sui-generis DB rights. Even the US where copyright in data(bases) is “weak” some copyright likely exists in most situations — though of course not phone directories! I’d also point out that CC licenses also operate as contracts, at least in common-law jurisdictions such as the US and the UK so it’s not as if the ODbL is being particularly unusual (though the ODbL is more explicit about this than CC licenses …)
The main reason you don’t want to use the GPL or a CC share-alike license is that it that they a) don’t deal with all relevant rights and b) they are not designed for data(bases) so they don’t deal with all the issues “nicely” (just as CC licenses were created for content despite the existence of existing “open” licenses for code because of the need for customization to the content situation). For more on this see the relevant section of the Open Data Commons FAQ.
Sharealike and Commercial Use
Lastly, I think it important emphasize that I don’t see Share-Alike as non-commercial or anti-commercial. In the free/open-source software world there is lots of commercial activity around codebases that are GPL’d.
Of course, it definitely makes it harder for some commercial users to use the information if they want to use proprietarily or directly combine it with proprietary data and it also can cause problems when intermixing with other sets of data with openness restrictions (such as those caused by privacy restrictions). However, at the same time, I would point out that it can also encourage commercial use since commercial participants know their contributions won’t be “free-ridden” upon.
A free software model for open knowledge
March 17th, 2010
Notes describing the talk on the work of the Open Knowledge Foundation given last week at Jornadas SIG Libre.

I was happily surprised to be asked to give this open knowledge talk at an open source software conference. But it makes sense - the free software movement has created the conditions in which an open data movement is possible. There is lots to learn from open source process, in both a technical and organisational sense.
In English we have one word “free” where Spanish like most languages has two, gratis and libre, signifying separately “free of cost” and “freedom to”. The Open Source Institute coined Open Source as a branding or marketing exercise to avoid the primary meaning “free of cost”. So whenever I say “open” I want you to hear the word “libre” [Later i was told that libre can have meaning in at least 15 different ways]
The best way to talk about the work of the Open Knowledge Foundation is to look at its projects, which form an open knowledge stack similar to the OSGeo software stack.
Open Definition
The Open Knowledge Definition is based on the OSI Open Source Software Definition (which OSGeo uses as a reference for acceptable software licenses). No restrictions on field of endeavour - non-commercial-use licenses are not open as in the OKD. An open data license will pass the cake test.
Open Data Commons
Open Data Commons is run by Jordan Hatcher, who started work on the Open Database License with support from Talis, later extensive negotiation with the OpenStreetmap community. ODbL is a ShareAlike license for data, that obviates the problems of inapplicability of copyright to facts, and greediness of the ShareAlike clause when it comes to use of maps in PDFs, etc.
PDDL is a license that implements the Science Commons protocol for open access data, explicitly placing it in the public domain.
The Panton Principles are four precepts for publishers of scientific research data who wish that data to be freely reusable. Being openly able to inspect, critique and re-analyse data is critical to the effectiveness of scientific research.
Open Data Grid
The Open Data Grid is a project in early incubation; based on the Tahoe distributed filesystem. It’s in need of development effort on Tahoe to really get going. Provide secure storage for open datasets around the edges of infrastructure that people are already running.

People are handwaving about the Cloud, but storage and backup are not problems that it is really meant to solve. People make different claims about the Cloud - cheaper, greener, more efficient, more flexible. Can we get these things in other ways?
There is a saying, “never underestimate the bandwidth of a truck full of DAT tapes”
Comprehensive Knowledge Archive Network (CKAN)
CKAN is inspired by free software package repositories, perl’s CPAN, R’s CRAN, python’s PyPi. It provides a wiki-like interface to create minimal metadata for packages with a versioned domain model and HTTP API.
CKAN supports groups, which can curate a package namespace - e.g. climate data - and assess priorities for turning into fully installable packages.
CKAN’s open source code is being used in the data package catalogue for the data.gov.uk project, part of the Making Public Data Public effort in the UK.
datapkg
The Debian of Data - datapkg takes Debian’s apt tool as inspiration for fully automatable install of data packages, with dependencies between them. This is currently in usable alpha stage with a python implementation.
Where Does My Money Go?
The next challenge really is to bring the concerns and the solutions to a mainstream public. Agustín Lobo spoke of “a personal consciousness but not an institutional consciousness” when it comes to open source and open data. Media coverage, exemplary government implementations, help to create this kind of consciousness.
Pressure for increased open access is coming from academia - for the research data underlying papers, for the right to data mine and correlate different sources, for library data open for re-use. Pressure is also coming from within museums, libraries and archives - memory institutions who want to increase exposure to their collections with new technology, and recognise that open data, linked to a network of resources, will work for sustainability and not against it.
The next generation of researchers, who are kids in school now, will grow up with an expectation that code and data are naturally open. It will be interesting to see what they make!
Meanwhile OpenStreetmap is feeding several startups, and more commercial presence in open data space will be of benefit. Illustrative that one does not have to be proprietary to be commercial.
Now higher-profile government projects opening data are helping to mainstream. To what extent is open a fashionable position, to what extent is open reflected throughout the way of working?
Open process; early release, public sharing of bugs, public discussion of plans - everything in Nat Torkington’s post on Truly Open Data. The opportunity to fail in public, to learn from others’ problems, and self-interestedly collaborate.
I had a great time at SIG Libre 10. Oscar Fonts’ talk on OpenSearch Geospatial interfaces to popular services has me itching to add an OpenSearch +Geo interface to CKAN, as well as to work on getting the apparent version skew in the Geo extensions resolved amicably.
Genís Roca spoke thought-provokingly on Retorno y rentabilidad (there isn’t really an equivalent English word - “rentability” - less exploitative or focused than profitability). Rentability, especially for online services, can come in ways that sustain an organisation predictably, and don’t involve fishing in the pockets of ultimate end-users.
Ivan Sanchez showed areas of OpenStreetmap Spain with stunning level of detail, trees and fences, MasterMap-quality coverage. I’m inspired to pick up JOSM and Markaartor to add building-level detail from out of copyright 1:500 Edinburgh town plans at the National Library of Scotland’s map services.
Agustin Lobo talked about the distributed work and cross-institutional support and benefit of the R project, and the impact of open source on open access to data in science. He mentioned a Nature open peer review experiment that was discarded - am thinking it wasn’t curated enough. The talk helped me to connect the OKF’s work to the rest of the Jornadas.
The shiny slides prezi.com which many people asked for details of - this should show embedded in the page I hope. I stupidly forgot to put URLs on the slides which is partly why i have written this blog.
Draft of an Open Data Commons Attribution License
January 12th, 2010
Yesterday Open Data Commons released a draft of a new attribution license specifically aimed at data and databases. We would warmly welcome feedback on the new draft, and help circulating it to relevant parties (including legal experts, prospective users and so on)!
From the announcement:
Open Data Commons are happy to announce the first draft of an attribution license for data/databases:
A commentable version of the text is available here:
Feedback is actively sought and we would be grateful for any assistance in circulating this announcement to relevant communities and networks.
The license is heavily based on the Open Database License (ODbL), though obviously without the share-alike provisions! With its simpler nature and its solid base from the ODbL, we don’t anticipate as much work as with the ODbL to get this to a 1.0.
The present plan is to start out with this first comments round based ending around the start of February. Based on the feedback received we will then assess how many further rounds of revision and consultation will be needed.
Some particular questions that it would be good to have feedback on:
- Is there any irrelevant matter that can be cut from the license (shorter is better!)
- Is attribution wanted for produced works (at the moment it is)
- What flexibility in attribution format/requirements should be supported
Background
The drafting of this license has been prompted by a clear need in several communities for an open license for data/databases that provides for attribution but does not impose share-alike requirements. Following discussion last Autumn on the public discuss list work was started on this draft attribution license.
New open data from London Datastore
January 11th, 2010
As you may well have seen, last Thursday the Greater London Authorities announced the new London Datastore:
From the press release:
The Mayor of London will unveil plans for the capital’s first open data project which will see large amounts of previously unavailable information from City Hall released online.
Similar to the hugely successful ‘Apps for the Democracy’ project in the United States the Mayor will be joined by President Barack Obama’s Chief Technology Officer Aneesh Chopra and Linda Cureton, Chief Information Officer, NASA during a rare live web link up with the world’s largest electronics show, CES, in Las Vegas.
The Mayor of London, Boris Johnson, also announced that there will be £200k from Channel Four’s 4IP to encourage people to make new useful services based on the data - which is excellent news!
Picking up from our international round up of open data on cities from last autumn, we’ve updated the package page on CKAN:
Is the data open? Though they don’t use a license or legal tool to make the data open, their Terms and Conditions appear to make the data open as in the Open Knowledge Definition. Nevertheless it would be good if they made this more explicit by using a legal tool such as the PDDL, ODbL, or CC0!
What data have they released? Speaking with Chris Taggart of Openly Local last week, we expected a fair few datasets to be sliced from existing sources such as the Office of National Statistics. But as Chris notes on his blog, it looks like there are plans to open up a lot more data, including new data from from Transport for London!
For more see:
- London unveils digital datastore, BBC News
- Boris Johnson to launch London ‘Datastore’ with hundreds of sets of data, Guardian
- London opens up data with online scheme, Financial Times
- Las Vegas industry event – or London data store launch?, Brian Hoadley
- Tinkering With Timetric – London Datastore Borough Population Data from Tony Hirst, Open University
- The GLA and open data: did he really say that?, Chris Taggart, Openly Local
Interview with Jordan Hatcher on legal tools for open data
December 15th, 2009
The Open Knowledge Foundation’s Jordan Hatcher was recently interviewed by the Semantic Web Company about Why we can’t use the same open licensing approach for databases as we do for content and software:
Legal certainty is crucial when it comes to build business around new technologies. The Open Knowledge Foundation has started to tackle this problem with respect to Linked Data. Tassilo Pellegrini spoke to the Open Content Lawyer Jordan S. Hatcher about licensing issues in Open Data and got some practical advice to get started on a complex but crucial topic.
After the Open Data and Semantic Web Workshop
November 20th, 2009

Last week we had a workshop on Open Data and the Semantic Web in London. There were some excellent talks, demos and discussions - and documentation is now online!
As a result of discussions we had at the workshop, we now have two new volunteer positions at the Open Knowledge Foundation. If you’re interested in either of these positions, please get in touch.
- Editor for Linking Open Data Group on CKAN. As we announced a few weeks ago, we now have a Linking Open Data Group on CKAN, our open source registry of open data. We are looking for someone to help keep the collection of datasets up to date with the latest offerings from the LOD/semantic web community!
- LOD/ODC Community Liason. Open Data Commons are looking for an member of the LOD/Semantic Web community to join their Open Data Commons Advisory Council with their role being to exchange information between the two communities (e.g., explaining about open data licensing to LOD community and taking licensing questions from community back to Open Data Commons).
A big thank you to Talis for sponsoring the event, to the London Knowledge Lab for donating the venue, and, of course to everyone who came and participated!
For photos, video and slides you can see:
- Agenda page on OKF wiki - including links to slides and videos
- Presentations and photos on the Internet Archive
- Photos from Paul Downey
Slides from Open Data Session at ISWC 2009
November 5th, 2009
The Open Knowledge Foundation’s Jordan Hatcher recently co-led a workshop on Legal and Social Frameworks for Sharing Data on the Web at the 8th International Semantic Web Conference. He was joined by Leigh Dodds and Tom Heath of Talis, and Kaitlin Thaney of Science Commons.
You can now see:
- Jordan’s slides - Open Data and the Law (PDF)
- Leigh’s slides - Rights Statements on the Web of Data (PDF)
- Tom’s slides - Licenses and Waivers in Practice (PDF)
- Kaitlin’s slides - Data Sharing: Social and Normative (Slideshare)
- The wiki page for the tutorial
If you’re in the UK and you’d like to catch up with Jordan, Leigh and Tom about open data and the semantic web, you can also come to the Open Data and Semantic Web Workshop in London on the 13th November 2009!

Jordan Hatcher at ISWC 2009 by Tom Heath
OpenFlights data released under Open Database License (ODbL)
October 14th, 2009

OpenFlights is a site for “flight logging, mapping, stats and sharing”.
We’re very pleased to hear they’ve just released their data under the Open Database License (ODbL):
One of OpenFlights‘ most popular features is our dynamic airport and airline route mapping, and today, we’re proud to release the underlying data in an easy-to-use form, up to date for October 2009. Behold 56749 routes between 3310 airports on 669 airlines spanning the globe.
The data can be downloaded from our Data page and is free to use under the Open Database License.
See also the OpenFlights package on CKAN:
ODC Open Database License (ODbL) Release Candidate 2 is Out
June 15th, 2009
Open Data Commons, a project we help host and run, has put out its second and final “Release Candidate” of the Open Database License (ODbL).
As it states in the announcement:
The Open Database License (ODbL) v1.0 “Release Candidate 2″ is now available at:
http://www.opendatacommons.org/licenses/odbl/
As expected there haven’t been many changes from the first Release Candidate. The two main alterations are:
- Removal of section 4.7 related to reverse engineering. This may be reintroduced in later versions but has been left out here in order to remove any possible concerns about license compatibility on Produced Works.
- Explicit statement that derivative databases used in the creation of Publicly Available Produced Works are also subject to share-alike.
With the completion of this second round of comments we believe this text is now in final “1.0″ form. In order to allow interested individuals and communities time to review the latest set of changes, as well as to provide an opportunity to catch any last minute “bugs” we are going to provide a one final, brief, comment period closing on Friday 19th of June at 1200GMT. Full details on how to comment can be found on the ODbL home page.
In preparation for the 1.0 release we have also continued to improve the FAQs as well as providing a new open data guide. Any feedback on these is also very welcome.
