Support Us

You are browsing the archive for Open Data Commons.

Open Street Map has officially switched to ODbL – and celebrates with a picnic

Jonathan Gray - September 12, 2012 in Exemplars, External, Featured, Open Data, Open Data Commons, WG Open Licensing

Open Street Map is probably the best example of a successful, community driven open data project.

The project was started by Steve Coast in 2004 in response to his frustration with the Ordnance Survey’s restrictive licensing conditions.

Steve presented on some of his early ‘mapping parties’ – where a small handful of friends would walk or cycle around with GPS devices and then rendezvous in the pub for a drink – at some of the Open Knowledge Foundation’s first events in London.

In the past 8 years it has grown from a project run by a handful of hobbyists on a shoestring to one of the world’s biggest open data projects, with hundreds of thousands of registered users and increasingly comprehensive coverage all over the world.

In short, Open Street Map is the Wikipedia of the open data world – and countless projects strive to replicate its success.

Hence we are delighted that – after a lengthy consultation process – today Open Street Map has officially switched to using the OKFN’s Open Data Commons ODbL license.

Michael Collinson, who is on the License Working Group at the OpenStreetMap Foundation, reports:

It is my great pleasure to pass on to you that as of 07:00 UTC this morning, 12th September 2012, OpenStreetMap began publishing its geodata under Open Data Common’s ODbL 1.0. That is several terabytes of data created by a contributor community of over a three-quarters of a million and growing every day.

The Open Street Map blog reports that OSM community members will be celebrating with a picnic:

At long last we are at the end of the license change process. After four years of consultation, debate, revision, improvement, revision, debate, improvement, implementation, coding and mapping, mapping, mapping, it comes down to this final step. And this final step is an easy one, because we have all pitched in to do the hard work in advance. The last step is so easy, it will be a picnic.

If you use data from Open Street Map, you can read about how the switch will affect you here.

A big well done to all involved for coming to the end of such a lengthy process – and we hope you enjoyed the sandwiches!

CC license version 4.0: Helping meet the needs of open data publishers and users

Timothy Vollmer - August 15, 2012 in External, Legal, Open Data Commons, Open Standards, Open/Closed, WG Open Licensing

Over the last few months, Creative Commons has been working on the next version of its license suite, version 4.0. The goals of version 4.0 are wide-ranging, but the overall objective is clear: update the licenses so they are considerably more robust, yet easy to understand and use, for both existing communities and new types of users.

A key community that version 4.0 aims to serve better are public sector agencies releasing data. Public sector information can be of great value, but the public needs to know what they can do with it. At the same time, public sector agencies need to be reassured that they can offer data in a way that gives them credit, maintains their reputation, and ensures some level of data integrity. Version 4.0 offers several updates in support of both open data publishers and users. A few of these are discussed below.

Sui generis database rights

One area of particular interest to European data publishers and users will be the shift in how CC licenses handle sui generis database rights. These rights are similar to copyright, but instead of granting particular exclusive rights to authors for creating an original work, database rights reward the author for the “sweat of the brow” in compiling a database. In 3.0, CC licenses do not require compliance with the license conditions where the use of a CC licensed database triggers only sui generis database rights but not copyright. At the same time, CC 3.0 does not grant permission to engage in activities protected by the database right. In 4.0, we propose to license sui generis database rights on par with copyright. Since sui generis database rights are similar to copyright (in 4.0 draft 2 it is called a “copyright-like” right), this will align with expectations of users.

Here’s an example. Let’s use as a baseline a CC BY licensed database of public transport data published by the city of Berlin.

In 3.0 (International), a user extracts some public transport data in the database in a way that doesn’t implicate copyright. For example, they might extract the names of underground stations and train times and plot them on a map. They don’t have to attribute the creator of the database required by the CC license because such an extraction of factual data would not implicate copyright. However, the user might still be liable for infringing the sui generis database rights under German law (enacted in-line with the EU database directive). And CC 3.0 doesn’t license those rights. The user has to figure it out for herself.

In 4.0, the goal is to make it so that even if the user extracts data from the CC BY licensed public transport database in a way that doesn’t implicate copyright (but does implicate the sui generis database right), the CC license grants those permissions (and imposes restrictions) in the same way as would be required under normal CC licensing circumstances. So, for example, the user extracts the names of underground stations and train times to plot them on a map. Even though this action still doesn’t implicate copyright, it does trigger sui generis database rights. Under CC BY 4.0, the database rights are granted, and the user must provide attribution to the creator of the database. Of course, if this change is adopted in 4.0, the licensing of sui generis database rights will only be in effect in jurisdictions that recognize these rights. So, for those jurisdictions where sui generis database rights do not exist, nothing would change.

Strengthen reputation and integrity

Another change queued up for 4.0 is the strengthening of particular provisions so that the CC licenses can be more easily used by institutions such as public sector bodies releasing open data. For example, 4.0 communicates more prominently that licensees may not imply or assert that their use of the licensed work is connected to or endorsed by the licensor. In addition to this “no endorsement” clause, 4.0 makes it possible for public sector bodies to add additional notices, warranties, or disclaimers of liability. The 4.0 draft also makes it clear – without making it a specific condition of the license itself – that users of licensed works are responsible for complying with laws outside of copyright that may apply to the use of the work, for instance data protection laws and laws guarding against fraud or misrepresentation. These mechanisms are important for official government bodies and data publishers: such institutions are sometimes apprehensive about releasing data sets if they think that downstream users will remix the data in ways that appear to show that the institution has sponsored or endorsed the use.

Updated attribution

CC 4.0 also attempts to clarify and simplify the attribution requirements. Licensees must still identify the author, the URL to where the work can be accessed, the URL to the CC license, and retain notices of disclaimers. Draft 4.0 streamlines the attribution process in a few ways — for example, it removes the requirement to include the title of the work. However, in version 4.0 licensees can satisfy these requirements in any reasonable manner based on the medium, means, and context in which the work is used. Flexibility is important considering the wide range of potential uses for CC licensed content, especially data. One way that this might play out is for a licensee to provide alongside the work a simple URL to a web page that contains the information required to meet the attribution terms. You can imagine how that would be useful to help address problems of attribution stacking — users of databases would not have to list every single contributor alongside their adaptation. Instead, they could point to a separate web page listing the contributors, which makes more sense in certain applications. With these updated attribution methods, it helps licensees to give credit to the authors in the manner they wish to be attributed.

All these issues (and more) continue to be discussed in consultation with the Creative Commons community. If all goes well, CC 4.0 will be published before the new year. We welcome feedback on the license-discuss email list.

Image: Construction Cranes by Evo, CC-BY 2.0

What do you think about Norway’s new open data license?

Guest - April 12, 2011 in Legal, OKF Projects, Open Data, Open Data Commons, Open Definition, Open Government Data, WG EU Open Data, WG Open Government Data, Working Groups

The following guest post is from Sverre Andreas Lunde-Danbolt who works for the Department for ICT and renewal in the Norwegian Ministry of Government Administration, Reform and Church Affairs, and who is a member of the OKF’s Working Group on Open Government Data

The Norwegian Ministry of Government Administration and Reform have just sent a draft version of a new Norwegian Licence for Open Data (NLOD) on a formal hearing here in Norway (the hearing documents (in Norwegian), and a blog post about the licence and the hearing (also in Norwegian)). After the hearing, we intend to recommend all government agencies in Norway to use this licence when they publish data.

Government agencies publishing data are not always very good at specifying the terms under which the information can be reused. In Norway, at least, the introduction of a new sui generis licence for each new data set has become a predictable exercise. This is confusing to the reuser, adding an uneccessary layer of uncertainty, and, in some cases, even impeding legitimate reuse.

The Ministry has therefore decided to establish one common licence. This will reduce the number of open data licences in Norway (one licence to rule them all). The licence is a rather straightforward attribution licence under Norwegian law. Its main purpose is to enable reuse in Norway, but to make sure data under NLOD can be combined with other data as well as reused internationally, the licence states clearly that it is compatible with Open Government Licence (v1.0), Creative Commons Attribution Licence (generic v1.0, v2.0, v2.5 and unported v3.0), and Open Data Commons Attribution Licence (v 1.0).

The most important details in the licence are the following:

  • Personal data is not covered by the licence. This is the same as in Open Government Licence.
  • The reuser cannot distort the information or use the information to mislead. The NLOD definition of this seems to be less restrictive than the definition used in Open Government Licence.
  • NLOD specifies that the licencor can provide more information on the quality or delivery of data, but that this kind of information is outside the scope of the licence. NLOD only covers the rights to use the information.
  • Information licenced under NLOD will also be licenced under future versions of the licence, provided that the licensor has not explicitly licenced information under v1.0. This gives the Norwegian Government more sway over public sector information, and reduces the chances of data ending up as a kind of orphan works in the future.

What do you think?

Keeping Open Government Data Open?

Jonathan Gray - March 1, 2011 in Ideas and musings, Open Data, Open Data Commons, Open Government Data, Open Standards, Open/Closed, WG EU Open Data, WG Open Government Data, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

An unprecedented amount of freely reusable government information is currently being released by public bodies around the globe. This is being consumed and reused by numerous stakeholders – including civic developers, data literate citizens, data journalists, NGOs, researchers, and companies. There is a tremendous opportunity to create a thriving ecosystem of open data, whereby numerous actors add value to a shared ‘commons’ of open data.

There is also a possibility that (whether through oversight or design) reusers consume government information without sharing back, leading to the creation of new data silos or restrictive licensing models.

For example, a university researcher might republish a dataset under a standard set of terms and conditions which prohibits republication and limits users to using the material for personal, research purposes. A company might use their standard licensing agreement which means that others can’t build on republished datasets. A developer might put a collection of re-formatted datasets online, but without an explicit license or legal notice – meaning others won’t know whether they are allowed to reuse them or not.

How can we try to ensure that open government data stays open? Perhaps we can try to promote ‘norms’ for reusers of open government data, to encourage them to contribute to a shared commons that everyone can benefit from. Here are a few suggestions:

  • Open in, open out – If you pull open data into your website or system, then others should be able to pull it out as open data as well.
  • If in doubt, let people know – If you are republishing open data, and you would like to keep it open, make sure and use an appropriate license or legal tool.
  • Keep track of provenance and permissions – If you are mixing datasets from different sources, and want to make sure you don’t accidentally republish proprietary data under an open license, make sure you keep track of which bits are open and which bits are not. Rather than going for the lowest common denominator (e.g. ‘because not all of this is open, we better use a restrictive license’), keep track of where different datasets come from and let users know which they can and can’t reuse.

Can you think of any more? If so let us know in a comment below or on our open-government list.

Richard Poynder interviews Jordan Hatcher

Guest - October 19, 2010 in Interviews, Legal, Open Data, Open Data Commons, Open Definition, Open Government Data, Open Knowledge Definition, Open Knowledge Foundation, Public Domain, WG Open Licensing

Open Acccess journalist extraordinaire Richard Poynder recently interviewed the Open Knowledge Foundation’s Jordan Hatcher about data licensing, the public domain, and lots more. An excerpt is reproduced below. The full version is available on Richard’s website.

Over the past twenty years or so we have seen a rising tide of alternative copyright licences emerge — for software, music and most types of content. These include the Berkeley Software Distribution (BSD) licence, the General Public Licence (GPL), and the range of licences devised by Creative Commons (CC). More recently a number of open licences and “dedications” have also been developed to assist people make data more freely available.

The various new licences have given rise to terms like “copyleft” and “libre” licensing, and to a growing social and political movement whose ultimate end-point remains to be established.

Why have these licences been developed? How do they differ from traditional copyright licences? And can we expect them to help or hinder reform of the traditional copyright system — which many now believe has got out of control? I discussed these and other questions in a recent email interview with Jordan Hatcher.

A UK-based Texas lawyer specialising in IT and intellectual property law, Jordan Hatcher is co-founder of OpenDataCommons.org, a board member of the Open Knowledge Foundation (OKF), and blogs under the name opencontentlawyer.

clip_image002

Jordan Hatcher

Big question

RP: Can you begin by saying something about yourself and your experience in the IP/copyright field?

JH: I’m a Texas lawyer living in the UK and focusing on IP and IT law. I concentrate on practical solutions and legal issues centred on the intersection of law and technology. While I like the entire field of IP, international IP and copyright are my most favourite areas.

As to more formal qualifications, I have a BA in Radio/TV/Film, a JD in Law, and an LLM in Innovation, Technology and the Law. I’ve been on the team that helped bring Creative Commons licences to Scotland and have led, or been a team member on, a number of studies looking at open content licences and their use within universities and the cultural heritage sector.

I was formerly a researcher at the University of Edinburgh in IP/IT, and for the past 2.5 years have been providing IP strategy and IP due diligence services with a leading IP strategy consultancy in London.

I’m also the co-founder and principal legal drafter behind Open Data Commons, a project to provide legal tools for open data, and the Chair of the Advisory Council for the Open Definition. I sit on the board for the Open Knowledge Foundation.

More detail than you can ask for is available on my web site here, and on my LinkedIn page here.

RP: It might also help if you reminded us what role copyright is supposed to play in society, how that role has changed over time (assuming that you feel it has) and whether you think it plays the role that society assigned to it successfully today.

JH: Wow that’s a big question and one that has changed quite a bit since the origin of copyright. As with most law, I take a utilitarian / legal realist view that the law is there to encourage a set of behaviours.

Copyright law is often described as being created to encourage more production and dissemination of works, and like any law, its imperfect in its execution.

I think what’s most interesting about copyright history is the technology side (without trying to sound like a technological determinist!). As new and potentially disruptive technologies have come along and changed the balance — from the printing press all the way to digital technology — the way we have reacted has been fairly consistent: some try to hang on to the old model as others eagerly adopt the new model.

For those interested in learning more about copyright’s history, I highly recommend the work of Ronan Deazley, and suggest people look at the first sections in Patry on Copyright. They could also usefully read Patry’s Moral Panics and the Copyright Wars. Additionally, there are many historical materials on copyright available at the homepage for a specific research project on the topic here.

Three tranches

RP: In the past twenty years or so we have seen a number of alternative approaches to licensing content develop — most notably through the General Public Licence and the set of licences developed by the Creative Commons. Why do you think these licences have emerged, and what are the implications of their emergence in your view?

JH: I see free and open licence development as happening within three tranches, all related to a specific area of use.

1. FOSS for software. Alongside the GPL, there have been a number of licences developed since the birth of the movement (and continuing to today), all aimed at software. These licences work best for software and tend to fall over when applied to other areas.

2. Open licences and Public licences for content. These are aimed at content, such as video, images, music, and so on. Creative Commons is certainly the most popular, but definitely not the first. The birth of CC does however represent a watershed moment in thinking about open licensing for content.

I distinguish open licences from public licences here, mostly because Creative Commons is so popular. Open has so many meanings to people (as do “free”) that it is critical to define from a legal perspective what is meant when one says “open”. The Open Knowledge Definition does this, and states that “open” means users have the right to use, reuse, and redistribute the content with very few restrictions — only attribution and share-alike are allowed restrictions, and commercial use must specifically be allowed.

The Open Definition means that only two out of the main six CC licences are open content licences — CC-BY and CC-BY-SA. The other four involve the No Derivatives (ND) restriction (thus prohibiting reuse) or have Non Commercial (NC) restrictions. The other four are what I refer to as “public licences”; in other words they are licences provided for use by the general public.

Of course CC’s public domain tools, such as CC0, all meet the Open Definition as well because they have no restrictions on use, reuse, and redistribution.

I wrote about this in a bit more detail recently on my blog.

3. Open Data Licences. Databases are different from content and software — they are a little like both in what users want to do with them and how licensors want to protect them, but are different from software and content in both the legal rights that apply and how database creators want to use open data licences.

As a result, there’s a need for specific open data licences, which is why we founded Open Data Commons. Today we have three tools available. It’s a new area of open licensing and we’re all still trying to work out all the questions and implications.

Open data

RP: As you say, data needs to be treated differently from other types of content, and for this reason a number of specific licences have been developed — including the Public Domain Dedication Licence (PDDL), the Public Doman Dedication Certificate (PDDC) and Creative Commons Zero. Can you explain how these licences approach the issue of licensing data in an open way?

JH: The three you’ve mentioned are all aimed at placing work into the public domain. The public domain has a very specific meaning in a legal context: It means that there are no copyright or other IP rights over the work. This is the most open/free approach as the aim is to eliminate any restrictions from an IP perspective.

There are some rights that can be hard to eliminate, and so of course patents may still be an issue depending on the context, (but perhaps that’s conversation for another time).

In addition to these tools, we’ve created two additional specific tools for openly licensing databases — the ODbL and the ODC-Attribution licences.

RP: Can you say something about these tools, and what they bring to the party?

JH: All three are tools to help increase the public domain and make it more known and accessible.

There’s some really exciting stuff going on with the public domain right now, including with PD calculators — tools to automatically determine whether a work is in the public domain. The great thing about work in the public domain is that it is completely legally interoperable, as it eliminates copyright restrictions.

See the rest of the interview on Open and Shut

Open Licenses vs Public Licenses

Guest - October 15, 2010 in Legal, OKF Projects, Open Data, Open Data Commons, Open Definition, Open Knowledge Definition, Open Knowledge Foundation, Open Standards, Open/Closed

The following post is from Jordan Hatcher, a Director at the Open Knowledge Foundation and founder of the Open Data Commons project. It was originally posted on his blog.

Let’s face it, we often have a definition problem.

It’s critical to distinguish “open licenses” from “public licenses” when discussing IP licensing, especially online — mostly because Creative Commons is so popular and as a result has muddied the waters a bit.

Open has so many meanings to people (same of course as with “free software” or free cultural works) that it is critical to define from a legal perspective what is meant when one says “open”. The Open Knowledge Definition does this, and states that “open” means users have the right to use, reuse, and redistribute the content with very few restrictions — only attribution and share-alike restrictions are ok, and commercial use must specifically be allowed.

Which CC licenses are Open?

The Open Definition means that only two out of the main six CC licenses are open content licenses — CC-BY and CC-BY-SA. The other four involve the two non-open license elements the No Derivatives (ND) restriction (thus prohibiting reuse) or have Non Commercial (NC) restrictions. The other four are “public licenses”, in other words they are licenses provided for use by the general public.

Of course CC’s public domain tools, such as CC0, all meet the Open Definition as well because they have no restrictions on use, reuse, and redistribution.

The Open Data Commons legal tools, including the PDDL, the ODbL and the ODC Attribution License, all comply with the Open Definition, and so are all open public licenses.

I haven’t done a full survey, but the majority of open licenses (in terms of popularity) probably also fit the definition of public licenses, as open license authors tend to draft licenses for public consumption (and these tend to be the most used ones, naturally) . Many open licenses aren’t public licenses though — mainly those drafted for specific use by a specific licensor, such as a government or business. So the UK government’s new Open Government License isn’t a public license because it’s not meant to be used without alteration by other governments, but provided it meets the definition of the Open Definition, would be an Open License.

A simple Venn Diagram might be:

Open Data Commons – Attribution License released

Rufus Pollock - June 25, 2010 in Legal, Open Data, Open Data Commons

Open Data Commons has released a new Open Data Commons attribution license (ODC-By). Jordan Hatcher, Chair of the Open Data Commons Advisory Council, writes:

Thanks to everyone for their feedback on the licenses and their help with the project. We can now announce a new license to the Open Data Commons family, the ODC Attribution License (ODC-BY) license. This is a database specific license requiring attribution for databases. This makes ODC-BY similar to the Creative Commons Attribution license, but is built specifically for databases. As a legal tool that only requires attribution, it complies with the Open Knowledge Definition, the Open Knowledge Foundation‘s standard around defining the rights behind what something means to be “open”.

ODC-BY homepage at:

http://www.opendatacommons.org/licenses/by/

Plain language summary of the ODC-BY is up at:

http://www.opendatacommons.org/licenses/by/summary/

Final license text at:

http://www.opendatacommons.org/licenses/by/1.0/

For those preferring plain text:

http://www.opendatacommons.org/wp-content/uploads/2010/01/odc_by_1.0_public_text.txt

Thanks for everyone’s help, particularly Rufus and the ODC advisory board.

Comments on the Panton Principles and Data Licensing

Rufus Pollock - March 25, 2010 in Legal, Open Data, Open Data Commons

These comments were originally written a few weeks ago as part of an interesting thread on John Dupuis’ blog post about the Panton Principles.

What’s “Open” and Why Do the Panton Principles Recommend PD-only

The Open Knowledge Foundation’s general position is one of supporting open data where “open” data includes data made available under licenses with attribution and share-alike clauses, though non-commercial restrictions are definitely not permitted (see http://www.opendefinition.org/ for precise details). The reason for excluding non-commercial is simple: share-alike is compatible with a commons open to everyone but non-commercial is not.

Panton Principles 1-3 are, in essence, saying make data “open” in the sense of http://www.opendefinition.org/. Principle 4 goes beyond this to specifically recommend public-domain only for data related to published science, especially where the work is publicly funded.

The rationale for this “stronger” position, at least for me, was that a) science has existing (very) strong norms for attribution (and, to a lesser extent, share-alike) b) science has strong up-front funding support from society which reduces some of the risks that share-alike addresses.

That said, I should emphasize that, in my view at least, the key feature is that the data be made open — public domain dedication/licensing is “strongly recommended” but if you end up with an attribution or even share-alike type license that is still far, far better than not making the data available at all, or licensing it under non-commercial or other conditions.

Attribution Stacking and Copyleft (Share-Alike)

I remain completely unconvinced by the attribution stacking argument against attribution requirements in licenses, and I find its logic in the area of science and the PP rather incoherent: we expect attribution to happen even with PD since it’s part of the community norms in science. As such attribution stacking happens with or without a license — unless attribution actually won’t be happening which is a serious issue ….

I’m also unclear why copyleft does not work for DBs. Using CC sharealike license for DBs isn’t a good idea but there are other licenses such as the Open Database License (ODbL).

For more detail see earlier posts such as: http://blog.okfn.org/2009/02/02/open-data-openness-and-licensing/ and Lhttp://blog.okfn.org/2009/02/09/comments-on-the-science-commons-protocol-for-implementing-open-access-data/>

Contract and the ODbL

Cameron Neylon in his comments wrote:

GPL/CC licences do not work for data across jurisdictions. They rely on copyright. Data in most places cannot be copyrighted. Where it can is inconsistent. Whatever else you do don’t use copyright licences on data because they will scare off the good guys and the bad guys will simply ignore them because they are un-enforceable. You can in principle use contract law to create similar restrictions (and the ODbL does this) but you need to ask yourself whether you want to bring contract law into this space. The consequences might not be what you want.

I think this is a bit of a misconception on several levels. In particular, the contract point about the ODbL is, in my view, very minor and is turning into a bit of FUD so I should correct it.

In my view, the main “enforcement” mechanism of the SA conditions in the ODbL remains existing IP rights whether copyright or sui-generis DB rights. Even the US where copyright in data(bases) is “weak” some copyright likely exists in most situations — though of course not phone directories! I’d also point out that CC licenses also operate as contracts, at least in common-law jurisdictions such as the US and the UK so it’s not as if the ODbL is being particularly unusual (though the ODbL is more explicit about this than CC licenses …)

The main reason you don’t want to use the GPL or a CC share-alike license is that it that they a) don’t deal with all relevant rights and b) they are not designed for data(bases) so they don’t deal with all the issues “nicely” (just as CC licenses were created for content despite the existence of existing “open” licenses for code because of the need for customization to the content situation). For more on this see the relevant section of the Open Data Commons FAQ.

Sharealike and Commercial Use

Lastly, I think it important emphasize that I don’t see Share-Alike as non-commercial or anti-commercial. In the free/open-source software world there is lots of commercial activity around codebases that are GPL’d.

Of course, it definitely makes it harder for some commercial users to use the information if they want to use proprietarily or directly combine it with proprietary data and it also can cause problems when intermixing with other sets of data with openness restrictions (such as those caused by privacy restrictions). However, at the same time, I would point out that it can also encourage commercial use since commercial participants know their contributions won’t be “free-ridden” upon.

A free software model for open knowledge

jwalsh - March 17, 2010 in CKAN, datapkg, Events, OKF Projects, Open Data Commons, Open Knowledge Definition, Open Knowledge Foundation, Talks

Notes describing the talk on the work of the Open Knowledge Foundation given last week at Jornadas SIG Libre.

OKF activity graph

I was happily surprised to be asked to give this open knowledge talk at an open source software conference. But it makes sense – the free software movement has created the conditions in which an open data movement is possible. There is lots to learn from open source process, in both a technical and organisational sense.

In English we have one word “free” where Spanish like most languages has two, gratis and libre, signifying separately “free of cost” and “freedom to”. The Open Source Institute coined Open Source as a branding or marketing exercise to avoid the primary meaning “free of cost”. So whenever I say “open” I want you to hear the word “libre” [Later i was told that libre can have meaning in at least 15 different ways]

The best way to talk about the work of the Open Knowledge Foundation is to look at its projects, which form an open knowledge stack similar to the OSGeo software stack.

Open Definition

The Open Knowledge Definition is based on the OSI Open Source Software Definition (which OSGeo uses as a reference for acceptable software licenses). No restrictions on field of endeavour – non-commercial-use licenses are not open as in the OKD. An open data license will pass the cake test.

Open Data Commons

Open Data Commons is run by Jordan Hatcher, who started work on the Open Database License with support from Talis, later extensive negotiation with the OpenStreetmap community. ODbL is a ShareAlike license for data, that obviates the problems of inapplicability of copyright to facts, and greediness of the ShareAlike clause when it comes to use of maps in PDFs, etc.

PDDL is a license that implements the Science Commons protocol for open access data, explicitly placing it in the public domain.

The Panton Principles are four precepts for publishers of scientific research data who wish that data to be freely reusable. Being openly able to inspect, critique and re-analyse data is critical to the effectiveness of scientific research.

Open Data Grid

The Open Data Grid is a project in early incubation; based on the Tahoe distributed filesystem. It’s in need of development effort on Tahoe to really get going. Provide secure storage for open datasets around the edges of infrastructure that people are already running.
4340727578_da9a6671a5_b

People are handwaving about the Cloud, but storage and backup are not problems that it is really meant to solve. People make different claims about the Cloud – cheaper, greener, more efficient, more flexible. Can we get these things in other ways?

There is a saying, “never underestimate the bandwidth of a truck full of DAT tapes”

Comprehensive Knowledge Archive Network (CKAN)

CKAN is inspired by free software package repositories, perl’s CPAN, R’s CRAN, python’s PyPi. It provides a wiki-like interface to create minimal metadata for packages with a versioned domain model and HTTP API.

CKAN supports groups, which can curate a package namespace – e.g. climate data – and assess priorities for turning into fully installable packages.

CKAN’s open source code is being used in the data package catalogue for the data.gov.uk project, part of the Making Public Data Public effort in the UK.

datapkg

The Debian of Data – datapkg takes Debian’s apt tool as inspiration for fully automatable install of data packages, with dependencies between them. This is currently in usable alpha stage with a python implementation.

Where Does My Money Go?

The next challenge really is to bring the concerns and the solutions to a mainstream public. Agustín Lobo spoke of “a personal consciousness but not an institutional consciousness” when it comes to open source and open data. Media coverage, exemplary government implementations, help to create this kind of consciousness.

Pressure for increased open access is coming from academia – for the research data underlying papers, for the right to data mine and correlate different sources, for library data open for re-use. Pressure is also coming from within museums, libraries and archives – memory institutions who want to increase exposure to their collections with new technology, and recognise that open data, linked to a network of resources, will work for sustainability and not against it.

The next generation of researchers, who are kids in school now, will grow up with an expectation that code and data are naturally open. It will be interesting to see what they make!

Meanwhile OpenStreetmap is feeding several startups, and more commercial presence in open data space will be of benefit. Illustrative that one does not have to be proprietary to be commercial.

Now higher-profile government projects opening data are helping to mainstream. To what extent is open a fashionable position, to what extent is open reflected throughout the way of working?

Open process; early release, public sharing of bugs, public discussion of plans – everything in Nat Torkington’s post on Truly Open Data. The opportunity to fail in public, to learn from others’ problems, and self-interestedly collaborate.


I had a great time at SIG Libre 10. Oscar Fonts’ talk on OpenSearch Geospatial interfaces to popular services has me itching to add an OpenSearch +Geo interface to CKAN, as well as to work on getting the apparent version skew in the Geo extensions resolved amicably.

Genís Roca spoke thought-provokingly on Retorno y rentabilidad (there isn’t really an equivalent English word – “rentability” – less exploitative or focused than profitability). Rentability, especially for online services, can come in ways that sustain an organisation predictably, and don’t involve fishing in the pockets of ultimate end-users.

Ivan Sanchez showed areas of OpenStreetmap Spain with stunning level of detail, trees and fences, MasterMap-quality coverage. I’m inspired to pick up JOSM and Markaartor to add building-level detail from out of copyright 1:500 Edinburgh town plans at the National Library of Scotland’s map services.

Agustin Lobo talked about the distributed work and cross-institutional support and benefit of the R project, and the impact of open source on open access to data in science. He mentioned a Nature open peer review experiment that was discarded – am thinking it wasn’t curated enough. The talk helped me to connect the OKF’s work to the rest of the Jornadas.

The shiny slides prezi.com which many people asked for details of – this should show embedded in the page I hope. I stupidly forgot to put URLs on the slides which is partly why i have written this blog.

Get Updates