Support Us

You are browsing the archive for Releases.

#opendata: New Film about Open Government Data

April 13, 2011 in Events, Interviews, OGDCamp, OKF, OKF Projects, Open Data, Open Government Data, Releases, Talks, WG EU Open Data, WG Open Government Data, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

The Open Knowledge Foundation is pleased to announce the release of #opendata, a new short film clip about open government data. The film includes interview footage with numerous open government data gurus and advocates, which we shot at last year’s Open Government Data Camp. You can find the film at opengovernmentdata.org/film.

({ video_url: “http://vimeo.com/21711338″, video_config: { color: ‘FF0000′, width: 549, height: 309 } })

If you’re interested in finding out more about the Open Knowledge Foundation‘s work in this area you can visit opengovernmentdata.org, a website about open government data around the world for and by the broader open government data community.

If you’re interested in meeting others interested in open government data around the world, please come and say hello on our ‘open-government‘ mailing list.

We are currently in the process of subtitling the film in several other languages. If you’d like to help translate the film into your language (or review or improve a translation) please fill in this form and we’ll get in touch with you with more details as soon as we can!

Europe’s Energy: a new mini-app to put the European energy targets into context

February 4, 2011 in Ideas and musings, OKF Projects, Open Data, Open Government Data, Releases, Sprint / Hackday, Visualization, WG EU Open Data, WG Visualisation, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

If you hang around any of the Open Knowledge Foundation’s many mailing lists, or if you follow us (or any of our people) on Twitter you may have noticed that we’ve been quietly working very hard on something recently. That ‘something’ is a new mini-project called Europe’s Energy and you can now explore it here:

    *

It is being launched to coincide with a big European Council meeting today, which has energy policy as one of its core topics. The application aims to help to put European energy policy (including the 2020 energy targets) into context, building on the work we did at the Eurostat Hackday in London just before Christmas.

You can use it to:

  • Compare different EU countries in terms of their carbon emissions, renewable energy share, energy dependency, net imports, and progress towards their respective renewables targets
  • Find out how much energy different EU countries consume, how they consume it, and how this has changed in recent years
  • Find out how much energy different EU countries produce, what the energy mix is like in different countries and how this has changed in recent years

The data is mainly from Eurostat, with a few other additional bits and pieces from elsewhere. This is just the beginning of our work in this area, and we’re very interested in looking at more fine-grained data, and new kinds of data. As part of publicdata.eu, we’ll be aggregating and providing a single point of access to all kinds of energy-related open data from local, regional and national public bodies from across Europe. So if you’re interested in energy data, watch this space! :-)

If you want to follow our work in this area, you can join our new europes-energy announcement list. If you’d like to contribute to discussion, or you’d like to talk to us more about our work in this area, please do come and say hello on our open-energy discussion list!

Opening up linguistic data at the American National Corpus

January 15, 2011 in External, Featured Project, OKF, Open Data, Open Knowledge Definition, Open/Closed, Releases, WG Linguistics, Working Groups

The following guest post is from Nancy Ide, Professor of Computer Science at Vassar College, Technical Director of the American National Corpus project and member of the Open Knowledge Foundation’s Working Group on Open Linguistic Data.

The American National Corpus (ANC) project is creating a collection of texts produced by native speakers of American English since 1990. Its goal is to provide at least 100 million words of contemporary language data covering a broad and representative range of genres, including but not limited to fiction, non-fiction, technical writing, newspaper, spoken transcripts of various verbal communications, as well as new genres (blogs, tweets, etc.). The project, which began in 1998, was originally motivated by three major groups: linguists, who use corpus data to study language use and change; dictionary publishers, who use large corpora to identify new vocabulary and provide examples; and computational linguists, who need very large corpora to develop robust language models—that is, to extract statistics concerning patterns of lexical, syntactic, and semantic usage—that drive natural language understanding applications such as machine translation and information search and retrieval (à la Google).

Corpora for computational linguistics and corpus linguistics research are typically annotated for linguistic features, so that, for example, every word is tagged with its part of speech, every sentence is annotated for syntactic structure, etc. To be of use to the research and development community, it should be possible to re-distribute the corpus with its annotations so that others can reuse and/or enhance it, if only to replicate results as is the norm for most scientific research. The redistribution requirement has proved to be a major roadblock to creating large linguistically annotated corpora, since most language data, even on the web, is not freely redistributable. As a result, the large corpora most often used for computational linguistics research on English are the Wall Street Journal corpus, consisting of material from that publication produced in the early ‘90s, and the British National Corpus (BNC), which contains varied genre British English produced prior to 1994, when it was first released. Neither corpus is ideal, the first because of the limited genre, and the second because it includes strictly British English and is annotated for part of speech only. In addition, neither reflects current usage (for example, words like “browser” and “google” do not appear).

The ANC was established to remedy the lack of large, contemporary, richly annotated American English corpora representing a wide range of genres. In the original plan, the project would follow the BNC development model: a consortium of dictionary publishers would provide both the initial funding and the data to include in the corpus, which would be distributed by the Linguistic Data Consortium (LDC) under a set of licenses reflecting the restrictions (or lack thereof) imposed by these publisher-donors. These publishers would get the corpus and its linguistic annotations for free and could use it as they wished to develop their products; commercial users who had not contributed either money or data would have to pay a whopping $40,000 to the LDC for the privilege of using the ANC for commercial purposes. The corpus would be available for research use only for a nominal fee.

The first and second releases (a total of 22 million words) of the ANC were distributed through LDC from 2003 onward under the conditions described above. However, shortly after the second ANC release in 2005, we determined that the license for 15 of the 22 million words in the ANC did not restrict its use in any way—it could be redistributed and used for any purpose, including commercial. We had already begun to distribute additional annotations (which are separate from and indexed into the corpus itself) on our web site, and it occurred to us that we could freely distribute this unrestricted 15 million words as well. This gave birth to the Open ANC (OANC), which was immediately embraced by the computational linguistics community. As a result, we decided that from this point on, additions to the ANC would include only data that is free of restrictions concerning redistribution and commercial use. Our overall distribution model is to enable anyone to download our data and annotations for research or commercial development, asking (but not requiring) that they give back any additional annotations or derived data they produce that might be useful for others, which we will in turn make openly available.

Unfortunately, the ANC has not been funded since 2005, and only a few of the consortium publishers provided us with texts for the ANC. However, we have continued to gather millions of words of data from the web that we hope to be able to add to the OANC in the near future. We search for current American English language data that is either clearly identified as public domain or licensed with a Creative Commons “attribution” license. We stay away from “share-alike” licenses because of the potential restriction for commercial use: a commercial enterprise would not be able to release a product incorporating share-alike data or resources derived from it under the same conditions. It is here that our definition of “open” differs from the Open Knowledge Definition—until we can be sure that we are wrong, we regard the viral nature of the share-alike restriction as prohibitive for some uses, and therefore data with this restriction are not completely “open” for our purposes.

Unfortunately, because we don’t use “share-alike” data, the web texts we can put in the OANC are severely limited. A post on this blog by Jordan Hatcher a little while ago mentioned that the popularity of Creative Commons licenses has muddied the waters, and we at the ANC project agree, although for different reasons. We notice that many people—particularly producers of the kinds of data we most want to get our hands on, such as fiction and other creative writing—tend to automatically slap at least a “share-alike” and often also a “non-commercial” CC license on their web-distributed texts. At the same time, we have some evidence that when asked, many of these authors have no objection to our including their texts in the OANC, despite the lack of similar restrictions. It is not entirely clear how the SA and NC categories became an effective default standard license, but my guess is that many people feel that SA and NC are the “right” and “responsible” things to do for the public good. This, in turn, may result from the fact that the first widely-used licenses, such as the GNU Public License, were intended for use with software. In this context, share-alike and non-commercial make some sense: sharing seems clearly to be the civic-minded thing to do, and no one wants to provide software for free that others could subsequently exploit for a profit. But for web texts, these criteria may make less sense. The market value of a text that one puts on the web for free use (e.g., blogs, vs. works published via traditional means and/or available through electronic libraries such as Amazon) is potentially very small, compared to that of a software product that provides some functionality that a large number of people would be willing to pay for. Because of this fact, use of web texts in a corpus like the ANC might qualify as Fair Use—but so far, we have not had the courage to test that theory.

We would really like to see something like Open Data Commons Attribution License (ODC-BY) become the license that authors automatically reach for when they publish language data on the web, in the way the CC-BY-SA-NC license is now. ODC-BY was developed primarily for databases, but it would not take much to apply it to language data, if it has not been done already (see, e.g., the Definition of Free Cultural Works). Either that, or we determine if in fact, because of the lack of monetary value, Fair Use could apply to whole texts (see for example, Bill Graham Archives v. Dorling Kindersley Ltd., 448 F. 3d 605 – Court of Appeals, 2nd Circuit 2006 concerning Fair Use applied to entire works).

In the meantime, we continue to collect texts from the web that are clearly usable for our purposes. We also have a web page set up where one can contribute their writing of any kind (fiction, blog, poetry, essay, letters, email) – with a sign off on rights – to the OANC. So far, we have managed to collect mostly college essays, which college seniors seem quite willing to contribute for the benefit of science upon graduation. We welcome contributions of texts (check the page to see if you are a native speaker of American English), as well as input on using web materials in our corpus.

Launch of the Public Domain Review to celebrate Public Domain Day 2011

January 1, 2011 in Public Domain, Public Domain Works, Releases, WG Public Domain, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

The 1st of January every year is Public Domain Day, when new works enter the public domain in many (though unfortunately not all) countries around the world.

To celebrate, the Open Knowledge Foundation is launching the Public Domain Review, a web-based review of works which have entered the public domain:

Each week an invited contributor will present an interesting or curious work with a brief accompanying text giving context, commentary and criticism. The first piece takes a look at works by Nathanael West, whose works enter the public domain today in many jurisdictions.

You can sign up to receive the review in your inbox via email. If you’re on Twitter, you can also follow @publicdomainrev. Happy Public Domain Day!

Launch of NosDonnees.fr, a community driven French open data catalogue

December 3, 2010 in CKAN, OKF, Open Data, Open Government Data, Releases, WG EU Open Data, WG Open Government Data, Working Groups

A quick note to announce (and celebrate!) the launch of a new community driven French open data catalogue, NosDonnees.fr last Friday in Paris.

  • The catalogue is a joint initiative between the Open Knowledge Foundation and Regards Citoyens. Efforts are currently underway to populate the catalogue with information about French public datasets, including legal information about how they can be reused.

The catalogue is powered by CKAN, which also powers data.gov.uk and over 20 other catalogues in various countries around the world! If you’d like to set up a catalogue in your country, please get in touch on the ckan-discuss list!

CKAN v1.2 Released together with Datapkg v0.7

November 30, 2010 in CKAN, datapkg, News, Releases

We’re delighted to announce CKAN v1.2, a new major release of the CKAN software. This is the largest iteration so far with 146 tickets closed and includes some really significant improvements most importantly a new extension/plugin system, SOLR search integration, caching and INSPIRE support (more details below). The extension work is especially significant as it now means you can extend CKAN without having to delve into any core code.

In addition there are now over 20 CKAN instances running around the world and CKAN is being used in official government catalogues in the UK, Norway, Finland and the Netherlands. Furthermore, http://ckan.net/ — our main community catalogue — now has over 1500 data ‘packages’ and has become the official home for the LOD Cloud (see the lod group on ckan.net).

We’re also aiming to provide a much more integrated ‘datahub’ experience with CKAN. Key to this is the provision of a ‘storage’ component to complement the registry/catalogue component we already have. Integrated storage will support all kinds of important functionality from automated archival of datasets to dataset cleaning with google refine.

We’ve already been making progress on this front with the launch of a basic storage service at http://storage.ckan.net/ (back in September) and the development of the OFS bucket storage library. The functionality is still at an alpha stage and integration with CKAN is still limited so improving this area will be a big aim for the next release (v1.3).

Even in its alpha stage, we are already making use of the storage system, most significantly, in the latest release of datapkg, our tool for distributing, discovering and installing data (and content) ‘packages’. In particular, the v0.7 release (more detail below) includes upload support allowing you store (as well as register) your data ‘packages’.

Highlights of CKAN v1.2 release

  • Package edit form: attach package to groups (#652) & revealable help
  • Form API – Package/Harvester Create/New (#545)
  • Authorization extended: authorization groups (#647) and creation of packages (#648)
  • Extension / Plug-in interface classes (#741)
  • WordPress twentyten compatible theming (#797)
  • Caching support (ETag) (#693)
  • Harvesting GEMINI2 metadata records from OGC CSW servers (#566)

Minor:

  • New API key header (#466)
  • Group metadata now revisioned (#231)

All tickets

Datapkg Release Notes

A major new release (v0.7) of datapkg is out!

There’s a quick getting started section below (also see the docs).

About the release

This release brings major new functionality to datapkg especially in regard to its integration with CKAN. datapkg now supports uploading as well as downloading and can now be easily extended via plugins. See the full changelog below for more details.

Get started fast

# 1. Install: (requires python and easy_install)
$ easy_install datapkg
# Or, if you don't like easy_install
$ pip install datapkg or even the raw source!

# 2. [optional] Take a look at the manual
$ datapkg man

# 3. Search for something
$ datapkg search ckan:// gold
gold-prices -- Gold Prices in London 1950-2008 (Monthly)

# 4. Get some data
# This will result in a csv file at /tmp/gold-prices/data
$ datapkg download ckan://gold-prices /tmp

# 5. Store some data
# Edit the gold prices csv making some corrections
$ cp gold-prices/data mynew.csv
$ edit mynew.csv
# Now upload back to storage
$ datapkg upload mynew.csv ckan://mybucket/ckan-gold-prices/mynew.csv

Find out more » — including how to create, register and distribute your own ‘data packages’.

Changelog

  • MAJOR: Support for uploading datapkgs (upload.py)
  • MAJOR: Much improved and extended documenation
  • MAJOR: New sqlite-based DB index giving support for a simple, central, ‘local’ index (ticket:360)
  • MAJOR: Make datapkg easily extendable

    • Support for adding new Index types with plugins
    • Support for adding new Commands with command plugins
    • Support for adding new Distributions with distribution plugins
  • Improved package download support (also now pluggable)

  • Reimplement url download using only python std lib (removing urlgrabber requirment and simplifying installation)
  • Improved spec: support for db type index + better documentation
  • Better configuration management (especially internally)
  • Reduce dependencies by removing usage of PasteScript and PasteDeploy
  • Various minor bugfixes and code improvements

Credits

A big hat-tip to Mike Chelen and Matthew Brett for beta-testing this release and to Will Waites for code contributions.

Open-Source Annotation Toolkit for Inline, Online Web Annotation

November 12, 2010 in Annotator, OKF Projects, Open Shakespeare, Releases

This is a post by Rufus Pollock, a long-time Open Knowledge Foundation member and coordinator of the Open Shakespeare project.

We’ve been working on web-annotation — inline, online annotation of web texts — for several years.

Our original motivation was to support annotation of texts in http://openshakespeare.org/ so we can collaboratively build up critical notes but since then I’ve seen this need again and again — in drafting new open data licenses, with scholars working on medieval canon law, when taking my own notes on academic papers.

http://openshakespeare.org annotation

Open Shakespeare’s Hamlet in annotate mode

What’s surprised me is that there appears to be no good opensource tool out there to do this. There are several commercial offerings (including annotation in google docs), and there have been opensource attempts such as annotea, Stet (for GPLv3), marginalia, and co-ment but none of these really seemed to work — my original implementation in 2006/2007 of annotation for http://openshakespeare.org/ used http://geof.net/‘s (excellent) marginalia library but I ultimately ran into performance and integration problems).

Thus, a year and a half ago, in collaboration with Nick Stenning, we started developing an annotator project to create a new, simple javascript (+ backend) library for web-annotation. Our main goals were and are:

  • Annotation of arbitrary text ranges
  • Annotate any web (html) document
  • Easy to use — 2 lines of javascript to insert this in your web page/app etc
  • Well-factored and library-structured — easy to integrate and easy to extend

Nick’s (who’s a great javascript (and css) developer), has been responsible for writing all of the frontend (i.e. the annotation stuff you actually see!) while I’ve developed the backend annotation store.

In the way of spare-time projects, development has been rather slower than we would have liked but we now have a functioning alpha which has now been running successfully on http://openshakespeare.org/ for the last 6 months.

Furthermore, the system is completely app-agnostic and is incredibly easy to use — adding annotation to your web page only requires one line of jquery javascript (assuming a backend is set up):

$('#your-element-id').annotator()

Interested? Below are links to project information including the source code and docs and mailing list. We’re especially eager to get feedback from those looking to integrate into other apps or who would like to help develop the library features.

Project Info

Source code

Features

  • Open JSON-REST annotation protocol – simple JSON and REST-based
  • Javascript (jquery-based) library for inserting inline annotations in a given document supporting this protocol
  • One or more backends implementing this protocol (emphasis on backends that are easy to deploy using standard tools e.g. using sql database or couchdb)
  • Really simple: just do (jquery-esqe) $(‘myelement’).annotator() to get up and running
  • Fast even on large documents
  • Support of multiple users
  • Pluggable backends

Visualising the German budget with Offener Haushalt

October 3, 2010 in News, OKF, OKF Germany, Open Data, Open Government Data, Releases, WG EU Open Data, WG Open Government Data

We’re delighted to announce that our friends at the Open Data Network and OKF Deutschland last week released some work that they have been doing to collate and visualise information related to public spending in Germany:

    *

Infosthetics broke the news:

> Offener Haushalt [offenerhaushalt.de] (German for ‘open budget’) is another demonstration of the large potential behind the emerging Open Data phenomenon. Based on data that was harvested by the extensive ‘screen scraping’ of the website of the Bundesfinanzministerium (the German equivalent to the U.S. Department of the Treasury, but I simply had to include this word), Offener Haushalt attempts to open up socially relevant data to allow open and shared analysis, interpretation and discussion. Unfortunately, like most other European governments today, the German ‘Bundesfinanzministerium’ does not make open, machine-readable datasets available for download yet.

> While still in an early ‘beta’, the online platform allows for the exploration (as ‘specific categories’, as ‘groups’ or as ‘functions’), and commenting of the acquired data in a treemap structure, while each detail view is stored under individual and shareable URLs. The original sources are clearly mentioned, while the data can be downloaded in JSon, RDF or XML formats. In the meantime, the developers are calling anyone with valuable German data to come forward.

The site has already received quite a bit of attention in the blogosphere, on Twitter and in the media (e.g. see this interview with Daniel and Friedrich in the Zeit). A big well done to all involved!

Data.gov.uk releases CKAN Drupal Module

August 21, 2010 in CKAN, External, OKF Projects, Open Data, Open Government Data, Releases

We’re delighted to see that the data.gov.uk folks have released the code for their CKAN Drupal module. As many will know, the OKF’s CKAN powers data.gov.uk as well as over a dozen other data catalogues around the world.

From the blog post:

As part of the government’s ongoing work around transparency, today we are releasing some of the custom software code we’ve developed – a CKAN module for Drupal. This is available for anyone to review, use, or modify. We’re excited to see how developers and colleagues across the world put this work to good use in their own applications and projects.

The code itself is attached to this blog post as a tar.gz file and contains one main package with two sub-packages within. This code release allows content to be synched from CKAN into Drupal. CKAN is the system we use as our “back end” to store information about all the data government has released. Drupal is a system to publish web content, and serves as our “front end” through which people can use to find our datasets and comment on them.

The main CKANPackage code creates a Drupal custom content type to represent data in the same way as CKAN. The first sub-package is the CKANImporter which imports packages from CKAN into Drupal and allows this to take place as a one-off batch import or as an update to the latest changes since a specified time. The second sub-package is CKANDatagovuk which correlates fields in CKAN with Drupal hooks.

The code release includes comments in the files to assist users with the functionality. You can of course contact us should you have any questions.

Launch of it.ckan.net for open data in Italy!

June 14, 2010 in CKAN, External, OKF, OKF Projects, Open Data, Open Government Data, Releases, WG EU Open Data, WG Open Government Data, Working Groups

The following guest post is by Stefano Costa and Federico Morando. Stefano Costa is a researcher at the University of Siena and Coordinator of the OKF’s Working Group on Open Data in Archaeology. Federico Morando is Managing Director & Research Fellow at the NEXA Center for Internet & Society and a member of the Working Group on EU Open Data.

We are delighted to announce that an Italian instance of CKAN is now live! You can see this at:

There are currently 67 packages available — thanks to the Extracting Value from Public Sector Information (EVPSI) project. In particular, the NEXA Center contributed material generated as part of the EVPSI project, which is funded by the Piedmont Region and coordinated by the University of Turin.

The site was launched on Sunday by OKF Director Rufus Pollock and NEXA Center co-director Juan Carlos De Martin at the 2010 Festival of Economics in Trento and is a collaboration between the Open Knowledge Foundation, the EVPSI project and the NEXA Center for Internet & Society.

The datasets that are currently available on the Italian instance of CKAN come from a first mapping of some of the main silos of public sector information (PSI) in Italy. Many more packages will be provided soon by EVPSI and the NEXA Center, as a product of a much more detailed mapping of PSI holding entities in the Italian Region of Piedmont.

Open data in Italy

Is Italy behind other countries with respect to open data? Judging from the data of the EVPSI project (and from the infringement procedure the the EU started against Italy), the answer to this question is ‘yes’, but things are changing. The Italian CKAN will hopefully help accelerate this change – providing a way for open data users and distributors to find datasets and see whether or not they can reuse them!

The new datasets on it.ckan.net include many which aren’t open, to help people get a ‘big picture’ about what datasets are out there, who holds them, how to download them and how open they are.

There are several bodies that produce data for their own institutional purposes, but most of the databases with clear commercial interest are only available by paying. And even when data are made available on the web they are distributed under restrictive terms of use or under unclear or no terms of use at all. That, considering the default status of potentially copyright and/or database right protected material (i.e. “All rights reserved”) implicitly means that no re-use is possible. This attitude is caused by a combination of factors, including:

  • lack of knowledge about the open data initiative and the benefits of open data for citizens and society at large
  • complex sub-licensing of datasets among many different public and private bodies, so that nobody can be considered the actual owner of data
  • a general fear of situations implying a loss of control over the re-use of data (coupled with a lack of internal guidelines about the access and re-use of data)
  • a difficult financial situation of PSI holders, pushing them to maximize their short run monetary income, without appropriately taking into account positive spillovers for the rest of society and in the medium/long run

For example ISTAT, the national institute of statistics, put their data online for free use, but unfortunately commercial reuse is not allowed – which may inhibit the development of innovative applications and services. See an overview of ISTAT datasets at CKAN.

A notable exception to this mindset is Regione Piemonte, that has recently launched a portal for open data at:

  • That result has been facilitated by the existence of common regional guidelines about the re-use of public data. What is more, all their currently available data are released under the CC0 license, enabling unrestricted re-use and dissemination by anyone, even for commercial purposes.

There are other regional governments offering some of their data (for example geospatial data) for free, but Piemonte is the only one explicitly adopting an open license. In all other cases, one has to ask for each case, and usually the answer is “free for non-commercial use” only.

The key point is that national and regional governments own large datasets that would be quite easily made available to the public. This process would however require 3 distinct actors, as outlined in the Open Data study by Becky Hogge:

  • government heads
  • civil servants (acting as the “middle layer”)
  • a small but determined group of citizens (or “civic hackers”)

Minister Brunetta promised “data.gov.it” in 6 months, but in the meantime we would like to get a more detailed picture of how open Italian public information is. In particular it will be interesting to see if any local authorities besides Regione Piemonte will consider following in the footsteps of many other local and national bodies around the world – and open up their data!

Interested in starting a new CKAN instance in your country?

If you’re interested in starting a new instance of CKAN for open data in your country, the Open Knowledge Foundation would be delighted to help! If you are able to help coordinate the translation and liaise with other local folks interested in open data — we can set up, host, and maintain the instance on our servers. Just pop us a line on the ckan-discuss list:

    *

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates