Support Us

You are browsing the archive for Releases.

OpenSpending v0.10 released

Lucy Chambers - September 20, 2011 in Open Spending, Releases

This post is by Martin Keegan, project lead on OpenSpending.

We’ve released v0.10 of the OpenSpending code, and made it live on

Changes in v0.10:

  • Data loading has been separated from the main web application. Web-based and command-line tools for data wranglers to load/reload datasets have been separated from the main end-user facing web application. They now reside in separate code repositories; there has been signficant reorganisation of the resulting source trees

  • More tests. Test coverage and organisation has been improved, and considerably more of the tests pass

  • Model overhaul. The integration between python and MongoDB has been effectively replaced

  • Removed dependency on celery. Long-running import tasks used to use a third-party subsystem called celery, which proved an administration and reliability hassle. It has been replaced by our own code.

  • Command line interface tidied up

  • Data-wrangler workflow improvements. Drop dataset is now supported, as are CLI and WUI tools for tagging CKAN packages for use with OpenSpending

Release of to map open data around the world

Jonathan Gray - June 30, 2011 in CKAN, News, OKI Projects, Open Data, Open Government Data, Our Work, Press, Releases, Technical, WG EU Open Data, WG Open Government Data, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

We’re very pleased to announce an alpha version of, a website to help keep track of open data catalogues from around the world. The project is being launched to coincide with our annual conference, OKCon 2011. You can see the site here:

The project was borne out of an extremely useful workshop on data catalogue interoperability in Edinburgh earlier this year, and then with a few further online meetings. It is powered by the CKAN software, which also powers and many other catalogues.

This is just the beginning of what we hope will become an invaluable resource for anyone interested in finding, using or having an overview of data catalogues from around the world. We have lots of ideas about improvements and features that we’d like to add. If you have anything you think we should prioritise, please let us know in comments below, or on the ckan-discuss list!

Below is a press release for the project (and here in Google Docs). If you know anyone who you think might be interested in this, we’d be most grateful for any help in passing it on!

PRESS RELEASE: Mapping open data around the world

BERLIN, 30th June 2011 – Today a broad coalition of stakeholders are launching, a new project to keep track of open data initiatives around the world.

Governments are beginning to recognise that opening up public information can bring about a wide variety of social and economic benefits – such as increasing transparency and efficiency, creating jobs in the new digital economy, and enabling web and mobile developers to create new useful applications and services for citizens.

But it can be difficult to keep up with the pace of developments in this area. Following on from the success of initiatives like the Obama administration’s and the UK government’s, nearly every week there is a new open data initiative from a local, regional or national government somewhere around the world – from Chicago to Torino, Morocco to Moldova.

A group of leading open data experts are helping to keep updated, including representatives from international bodies such as the World Bank, independent bodies such as the W3C and the Sunlight Foundation, and numerous national governments.

Neil Fantom, Manager of the World Bank’s Development Data Group, says: “Open data is public good, but only if you can find it – we’re pleased to see initiatives such as giving greater visibility to public information, allowing easier discovery of related content from different publishers and making open data more valuable for users.”

Beth Noveck, who ran President Obama’s open government programme and is now working with the UK Government says: “This project is a simple but important start to bringing together the community of key open data stakeholders. My hope is that grows into a vibrant place to articulate priorities, find and mash up data across jurisdictions and curate data-driven tools and initiatives that improve the effectiveness of government and the lives of citizens.”

Cathrine Lippert, of the Danish National IT and Telecom Agency says: “ is a brilliant guide to keeping track of all the data that is being opened up around the world. In addition to our own national data catalogue, we can now point data re-users to to locate data resources abroad.”

Andrew Stott, former Director of Digital Engagement at the UK’s Cabinet Office says: “This initiative will not only help data users find data in different jurisdictions but also help those implementing data catalogues to find good practice to emulate elsewhere in the world.”

Notes for editors

The Open Knowledge Foundation ( is a not-for-profit organisation founded in 2004. It has played a significant role in supporting open data around the world, particularly in Europe, and helps to run the UK’s national data catalogue, is being launched at the Open Knowledge Foundation’s annual conference, OKCon 2011 ( which brings together developers, designers, civil servants, journalists and NGOS for a week of planning, coding and talks.

For further details please contact Jonathan Gray, Community Coordinator at the Open Knowledge Foundation on

The Annotator – Preview

Lucy Chambers - May 12, 2011 in Annotator, News, Releases, Technical

In November 2010, Rufus Pollock announced the Annotator project on the OKFN blog. Since this initial release the project has been developed into a fully fledged product.

The Annotator is a JavaScript widget that can be added to any webpage to allow inline annotation of its contents. Combined with a storage system, such as AnnotateIt, this allows online collaboration on all forms of HTML documentation, literature and articles.

Annotator 1 Annotator 3 Annotator 4


Updated interface

Making use of the latest web technologies, we’ve refined the interface to be more intuitive for the user. Annotations will always try and display inside the window and the comment editor can now be resized and repositioned to suit.

Simple to install

Once the Annotator library code has been included, the Annotator can be added to your website in just one more line of JavaScript:

var content = jQuery(‘#content-to-annotate’).annotator();

Pluggable architecture

The Annotator itself is a just a small piece of code that manages the annotations. It can then be extended with plugins to enhance its functionality.

A broad range of plugins can be created. Examples bundled with the Annotator include tagging, permissions and a store for interfacing with an external database to save annotations.

Add plugins to your webpage with just one more line of code:

content.annotator(‘addPlugin’, ‘Tags’).annotator(‘addPlugin’, ‘Store’);


The Annotator Store plugin can be tailored to talk to any backend but to make things really easy we’ve created AnnotateIt, a simple web service for storing your annotations. This is currently a closed Beta and you can sign up for an account to test it out here.

On the other hand if you’re feeling creative you can build your own store. The Annotator uses a very simple JSON based protocol that can be implemented in any language. The AnnotateIt source code is available on GitHub for guidance.


AnnotateIt also allowed us to build a bookmarklet that allows you to inject the Annotator into any webpage and begin annotating immediately. The next time you visit the webpage simply reload the bookmarklet to have all your annotations restored. This makes it easy to separate editorial control over content from control over commentary.

Find out more

If we’ve piqued your interest please head on over to and sign up for a free account.

We’re very excited to see what can be done with the project and are seeking feedback to continue to improve the product. To get in touch just drop us an email at annotator [at] okfn [dot] org.

If you’re interested in adding the Annotator to your own site or developing a plugin then our GitHub page is the place to go. Here you can download the latest Annotator source code and browse the wiki for details on getting started.

Useful Links

Where does Italy’s money go?

Jonathan Gray - April 19, 2011 in Events, OKI Projects, Open Data, Open Government Data, Open Knowledge Foundation, Open/Closed, Press, Releases, Visualization, WG EU Open Data, WG Open Government Data, Where Does My Money Go, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

Over the past 48 hours or so we’ve been busy loading 12 years of Italian spending data into Open Spending. Further details on the project and the data are below.

This project was put together by Stefano Costa, Friedrich Lindenberg, Luca Nicotra, Angelo Centini, Elena Donnari, Diego Galli, and countless other passers by at the International Journalism Festival in Perugia (which I spoke at on Saturday).

If you’re interested in spending data in your country and you’d like to work with us to load it into the Open Spending platform, come and say hello on our wdmmg-discuss mailing list!

Update 2011-04-20: the release was covered in the Guardian (UK), Il Fatto Quotidiano (Italy), Il Post (Italy), La Stampa (Italy), Repubblica (Italy), and Wired (Italy).

English version

  • App:
  • Data:

What is this?

The visualisation is Italian public spending data which has been loaded into Open Spending, a project of the Open Knowledge Foundation.

What is the Open Spending project?

The Open Spending project aims to make it easier for the public to explore and understand government spending. It came out of Where Does My Money Go?, an award winning project which enables people to see how UK public funds are spent. Open Spending is currently working with groups and individuals in over 20 countries to set up an international database on public spending.

What is the story behind the Italian Open Spending project?

A small group of developers, journalists, civil servants and others collaborated to load the Italian data into the platform on a 48 hour sprint, starting at the International Journalism Festival in Perugia, finishing at a conference on open government in Rome.

Where will the project be launched?

The project will be launched at a major conference on open government hosted at the Italian parliament in Rome on April 19th. This will bring together journalists, politicians, developers, designers, entrepreneurs, academics, civic society organisations, and representatives from public bodies to discuss the future of open government data in Italy.

How is Italian government spending data produced?

There are three separate levels of government (i) central administrations (government departments), (ii) regional administrations (20 regions and 2 autonomous provinces), and (iii) and local administrations (over 8,000 munipalities, plus 100+ provinces and mountain communities).

Spending documents and datasets are produced at each of these three layers – and are published on a variety of different governement websites. These are aggregated, analysed and republished by a variety of different public bodies for a variety of different purposes.

Where is the data from and where can I get it from?

The data is from the Regional Public Accounts (RPA) project. The data is already online on a dedicated website, where it is updated annually. You can find this data here.

What is the Regional Public Accounts (RPA) project?

The Regional Public Accounts (RPA) project provides an overview of spending from all of these layers of government from a single place, and consolidates spending flows between these different layers to provide a consistent, harmonised picture of the total public expenditure.

This work is executed by a unit based at the Department for Development and Economic Cohesion, which is supported by 21 units located in each region.

What time period does it cover?

The data that is currently loaded covers the period from 1996 to 2008.

How granular is the data?

To illustrate this with an example: the data will not tell you how many computers were bought for a school, and how much they each cost. But it will tell you how much was spent on personnel, educational support to households, or construction and maintenance in the school sector in a given region, and by which level of government the money was spent.

Versione Italiana

Dove vanno i nostri soldi?

  • Website:
  • Dati:

Cos’è questo progetto?

La visualizzazione della spesa pubblica italiana all’interno del progetto Open Spending dell’Open Knowledge Foundation.

Cos’è il progetto Open Spending?

Il progetto Open Spending mira a rendere piu’ semplice per il pubblico esplorare e comprendere la spesa pubblica. Deriva dal progetto Where Does My Money Go? un progetto vincitori di premi che permette di vedere come sono spesi i fondi pubblici della Gran Bretagna. Open Spending in questo momento sta lavorando con gruppi ed individui in più di 20 paesi per realizzare un database internazione sulla spesa pubblica.

Qual’è la storia del progetto italiano di Open Spending?

Un piccolo gruppo di sviluppatori, giornalisti, impiegati pubblici e altri hanno collaborato a caricare i dati italiani in una piattaforma in una corsa di 48 ore, iniziando al Festival Internazionale di Perugia, e finendo ad una conferenza sulla trasparenza e il governo aperto a Roma.

Dove sarà presentato il progetto?

Il progetto sarà lanciato in una importante conferenza sull’Open Governemnt intitolata “La Politica della Trasparenza e dei Dati Aperti” ospitata dal parlamento italiano a Roma il 19 Aprile. Un evento che radunerà giornalisti, politici, sviluppatori, imprenditori, accademici, organizzazioni della società civile, e rappresentanti del settore pubblico, per discutere del futuro dell’Open Government e dei dati aperti in Italia.

Come sono prodotti i dati sui conti pubblici italiani?

Ci sono tre diversi livelli di governo (i) le amministrazioni centrali (ii) le amministrazioni regionali (20 regioni e 2 provincie autonome) e (iii) le amministrazioni locali (oltre 8000 comuni, oltre 100 provincie e comunità montane).

I documenti di spesa sono prodotti da ognuno dei livelli di governo e sono pubblicati sui siti istituzionali delle varie amministrazioni centrali e locali.

Tali documenti e dati vengono aggregati, analizzati e ripubblicati da molte differenti amminsitrazioni per diveri scopi.

Da dove provengono i dati?

I dati provengono dal progetto Conti Pubblici Territoriali[6]. I dati sono già online su un sito dedicato, dove vengono aggiornati annualmente. Potete trovare questi dati qui.

Cosa sono i Conti Pubblici Territoriali?

Il progetto Conti Pubblici Territoriali (CPT) fornisce una visione d’insieme delle spese di tutti questi livelli di governo, e consolida i flussi di spesa tra questi diversi livelli per fornire un’immagine consistente e secondo una classificazione armonizzata della spesa pubblica italiana.

Questo lavoro è svolto da una unità basata al Dipartimento dello Sviluppo e della Coesione Economica, che è supportato da 21 unità regionali.

Che periodo coprono i dati?

I dati coprono attualmente il periodo dal 1996 al 2008.

Quanto sono granulari i dati?

Per spiegarlo con un esempio: i dati non forniscono dettagli su quanti computer siano stati acquistati per una scuola, o quanto costi ciascuno di essi. Ma diranno quanto viene speso per il personale, per il materiale di supporto all’educazione, o per la costruzione e la manutenzione nel settore scolastico in una data regione, e per ogni livello di governo.

#opendata: New Film about Open Government Data

Jonathan Gray - April 13, 2011 in Events, Interviews, OGDCamp, OKI Projects, Open Data, Open Government Data, Open Knowledge Foundation, Releases, Talks, WG EU Open Data, WG Open Government Data, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

The Open Knowledge Foundation is pleased to announce the release of #opendata, a new short film clip about open government data. The film includes interview footage with numerous open government data gurus and advocates, which we shot at last year’s Open Government Data Camp. You can find the film at

({ video_url: “”, video_config: { color: ‘FF0000’, width: 549, height: 309 } })

If you’re interested in finding out more about the Open Knowledge Foundation‘s work in this area you can visit, a website about open government data around the world for and by the broader open government data community.

If you’re interested in meeting others interested in open government data around the world, please come and say hello on our ‘open-government‘ mailing list.

We are currently in the process of subtitling the film in several other languages. If you’d like to help translate the film into your language (or review or improve a translation) please fill in this form and we’ll get in touch with you with more details as soon as we can!

Europe’s Energy: a new mini-app to put the European energy targets into context

Jonathan Gray - February 4, 2011 in Ideas and musings, OKI Projects, Open Data, Open Government Data, Releases, Sprint / Hackday, Visualization, WG EU Open Data, WG Visualisation, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

If you hang around any of the Open Knowledge Foundation’s many mailing lists, or if you follow us (or any of our people) on Twitter you may have noticed that we’ve been quietly working very hard on something recently. That ‘something’ is a new mini-project called Europe’s Energy and you can now explore it here:


It is being launched to coincide with a big European Council meeting today, which has energy policy as one of its core topics. The application aims to help to put European energy policy (including the 2020 energy targets) into context, building on the work we did at the Eurostat Hackday in London just before Christmas.

You can use it to:

  • Compare different EU countries in terms of their carbon emissions, renewable energy share, energy dependency, net imports, and progress towards their respective renewables targets
  • Find out how much energy different EU countries consume, how they consume it, and how this has changed in recent years
  • Find out how much energy different EU countries produce, what the energy mix is like in different countries and how this has changed in recent years

The data is mainly from Eurostat, with a few other additional bits and pieces from elsewhere. This is just the beginning of our work in this area, and we’re very interested in looking at more fine-grained data, and new kinds of data. As part of, we’ll be aggregating and providing a single point of access to all kinds of energy-related open data from local, regional and national public bodies from across Europe. So if you’re interested in energy data, watch this space! :-)

If you want to follow our work in this area, you can join our new europes-energy announcement list. If you’d like to contribute to discussion, or you’d like to talk to us more about our work in this area, please do come and say hello on our open-energy discussion list!

Opening up linguistic data at the American National Corpus

Guest - January 15, 2011 in External, Featured Project, Open Data, Open Knowledge Definition, Open Knowledge Foundation, Open/Closed, Releases, WG Linguistics, Working Groups

The following guest post is from Nancy Ide, Professor of Computer Science at Vassar College, Technical Director of the American National Corpus project and member of the Open Knowledge Foundation’s Working Group on Open Linguistic Data.

The American National Corpus (ANC) project is creating a collection of texts produced by native speakers of American English since 1990. Its goal is to provide at least 100 million words of contemporary language data covering a broad and representative range of genres, including but not limited to fiction, non-fiction, technical writing, newspaper, spoken transcripts of various verbal communications, as well as new genres (blogs, tweets, etc.). The project, which began in 1998, was originally motivated by three major groups: linguists, who use corpus data to study language use and change; dictionary publishers, who use large corpora to identify new vocabulary and provide examples; and computational linguists, who need very large corpora to develop robust language models—that is, to extract statistics concerning patterns of lexical, syntactic, and semantic usage—that drive natural language understanding applications such as machine translation and information search and retrieval (à la Google).

Corpora for computational linguistics and corpus linguistics research are typically annotated for linguistic features, so that, for example, every word is tagged with its part of speech, every sentence is annotated for syntactic structure, etc. To be of use to the research and development community, it should be possible to re-distribute the corpus with its annotations so that others can reuse and/or enhance it, if only to replicate results as is the norm for most scientific research. The redistribution requirement has proved to be a major roadblock to creating large linguistically annotated corpora, since most language data, even on the web, is not freely redistributable. As a result, the large corpora most often used for computational linguistics research on English are the Wall Street Journal corpus, consisting of material from that publication produced in the early ‘90s, and the British National Corpus (BNC), which contains varied genre British English produced prior to 1994, when it was first released. Neither corpus is ideal, the first because of the limited genre, and the second because it includes strictly British English and is annotated for part of speech only. In addition, neither reflects current usage (for example, words like “browser” and “google” do not appear).

The ANC was established to remedy the lack of large, contemporary, richly annotated American English corpora representing a wide range of genres. In the original plan, the project would follow the BNC development model: a consortium of dictionary publishers would provide both the initial funding and the data to include in the corpus, which would be distributed by the Linguistic Data Consortium (LDC) under a set of licenses reflecting the restrictions (or lack thereof) imposed by these publisher-donors. These publishers would get the corpus and its linguistic annotations for free and could use it as they wished to develop their products; commercial users who had not contributed either money or data would have to pay a whopping $40,000 to the LDC for the privilege of using the ANC for commercial purposes. The corpus would be available for research use only for a nominal fee.

The first and second releases (a total of 22 million words) of the ANC were distributed through LDC from 2003 onward under the conditions described above. However, shortly after the second ANC release in 2005, we determined that the license for 15 of the 22 million words in the ANC did not restrict its use in any way—it could be redistributed and used for any purpose, including commercial. We had already begun to distribute additional annotations (which are separate from and indexed into the corpus itself) on our web site, and it occurred to us that we could freely distribute this unrestricted 15 million words as well. This gave birth to the Open ANC (OANC), which was immediately embraced by the computational linguistics community. As a result, we decided that from this point on, additions to the ANC would include only data that is free of restrictions concerning redistribution and commercial use. Our overall distribution model is to enable anyone to download our data and annotations for research or commercial development, asking (but not requiring) that they give back any additional annotations or derived data they produce that might be useful for others, which we will in turn make openly available.

Unfortunately, the ANC has not been funded since 2005, and only a few of the consortium publishers provided us with texts for the ANC. However, we have continued to gather millions of words of data from the web that we hope to be able to add to the OANC in the near future. We search for current American English language data that is either clearly identified as public domain or licensed with a Creative Commons “attribution” license. We stay away from “share-alike” licenses because of the potential restriction for commercial use: a commercial enterprise would not be able to release a product incorporating share-alike data or resources derived from it under the same conditions. It is here that our definition of “open” differs from the Open Knowledge Definition—until we can be sure that we are wrong, we regard the viral nature of the share-alike restriction as prohibitive for some uses, and therefore data with this restriction are not completely “open” for our purposes.

Unfortunately, because we don’t use “share-alike” data, the web texts we can put in the OANC are severely limited. A post on this blog by Jordan Hatcher a little while ago mentioned that the popularity of Creative Commons licenses has muddied the waters, and we at the ANC project agree, although for different reasons. We notice that many people—particularly producers of the kinds of data we most want to get our hands on, such as fiction and other creative writing—tend to automatically slap at least a “share-alike” and often also a “non-commercial” CC license on their web-distributed texts. At the same time, we have some evidence that when asked, many of these authors have no objection to our including their texts in the OANC, despite the lack of similar restrictions. It is not entirely clear how the SA and NC categories became an effective default standard license, but my guess is that many people feel that SA and NC are the “right” and “responsible” things to do for the public good. This, in turn, may result from the fact that the first widely-used licenses, such as the GNU Public License, were intended for use with software. In this context, share-alike and non-commercial make some sense: sharing seems clearly to be the civic-minded thing to do, and no one wants to provide software for free that others could subsequently exploit for a profit. But for web texts, these criteria may make less sense. The market value of a text that one puts on the web for free use (e.g., blogs, vs. works published via traditional means and/or available through electronic libraries such as Amazon) is potentially very small, compared to that of a software product that provides some functionality that a large number of people would be willing to pay for. Because of this fact, use of web texts in a corpus like the ANC might qualify as Fair Use—but so far, we have not had the courage to test that theory.

We would really like to see something like Open Data Commons Attribution License (ODC-BY) become the license that authors automatically reach for when they publish language data on the web, in the way the CC-BY-SA-NC license is now. ODC-BY was developed primarily for databases, but it would not take much to apply it to language data, if it has not been done already (see, e.g., the Definition of Free Cultural Works). Either that, or we determine if in fact, because of the lack of monetary value, Fair Use could apply to whole texts (see for example, Bill Graham Archives v. Dorling Kindersley Ltd., 448 F. 3d 605 – Court of Appeals, 2nd Circuit 2006 concerning Fair Use applied to entire works).

In the meantime, we continue to collect texts from the web that are clearly usable for our purposes. We also have a web page set up where one can contribute their writing of any kind (fiction, blog, poetry, essay, letters, email) – with a sign off on rights – to the OANC. So far, we have managed to collect mostly college essays, which college seniors seem quite willing to contribute for the benefit of science upon graduation. We welcome contributions of texts (check the page to see if you are a native speaker of American English), as well as input on using web materials in our corpus.

Launch of the Public Domain Review to celebrate Public Domain Day 2011

Jonathan Gray - January 1, 2011 in Public Domain, Public Domain Works, Releases, WG Public Domain, Working Groups

The following post is from Jonathan Gray, Community Coordinator at the Open Knowledge Foundation.

The 1st of January every year is Public Domain Day, when new works enter the public domain in many (though unfortunately not all) countries around the world.

To celebrate, the Open Knowledge Foundation is launching the Public Domain Review, a web-based review of works which have entered the public domain:

Each week an invited contributor will present an interesting or curious work with a brief accompanying text giving context, commentary and criticism. The first piece takes a look at works by Nathanael West, whose works enter the public domain today in many jurisdictions.

You can sign up to receive the review in your inbox via email. If you’re on Twitter, you can also follow @publicdomainrev. Happy Public Domain Day!

Launch of, a community driven French open data catalogue

Jonathan Gray - December 3, 2010 in CKAN, Open Data, Open Government Data, Open Knowledge Foundation, Releases, WG EU Open Data, WG Open Government Data, Working Groups

A quick note to announce (and celebrate!) the launch of a new community driven French open data catalogue, last Friday in Paris.

  • The catalogue is a joint initiative between the Open Knowledge Foundation and Regards Citoyens. Efforts are currently underway to populate the catalogue with information about French public datasets, including legal information about how they can be reused.

The catalogue is powered by CKAN, which also powers and over 20 other catalogues in various countries around the world! If you’d like to set up a catalogue in your country, please get in touch on the ckan-discuss list!

CKAN v1.2 Released together with Datapkg v0.7

Rufus Pollock - November 30, 2010 in CKAN, datapkg, News, Releases

We’re delighted to announce CKAN v1.2, a new major release of the CKAN software. This is the largest iteration so far with 146 tickets closed and includes some really significant improvements most importantly a new extension/plugin system, SOLR search integration, caching and INSPIRE support (more details below). The extension work is especially significant as it now means you can extend CKAN without having to delve into any core code.

In addition there are now over 20 CKAN instances running around the world and CKAN is being used in official government catalogues in the UK, Norway, Finland and the Netherlands. Furthermore, — our main community catalogue — now has over 1500 data ‘packages’ and has become the official home for the LOD Cloud (see the lod group on

We’re also aiming to provide a much more integrated ‘datahub’ experience with CKAN. Key to this is the provision of a ‘storage’ component to complement the registry/catalogue component we already have. Integrated storage will support all kinds of important functionality from automated archival of datasets to dataset cleaning with google refine.

We’ve already been making progress on this front with the launch of a basic storage service at (back in September) and the development of the OFS bucket storage library. The functionality is still at an alpha stage and integration with CKAN is still limited so improving this area will be a big aim for the next release (v1.3).

Even in its alpha stage, we are already making use of the storage system, most significantly, in the latest release of datapkg, our tool for distributing, discovering and installing data (and content) ‘packages’. In particular, the v0.7 release (more detail below) includes upload support allowing you store (as well as register) your data ‘packages’.

Highlights of CKAN v1.2 release

  • Package edit form: attach package to groups (#652) & revealable help
  • Form API – Package/Harvester Create/New (#545)
  • Authorization extended: authorization groups (#647) and creation of packages (#648)
  • Extension / Plug-in interface classes (#741)
  • WordPress twentyten compatible theming (#797)
  • Caching support (ETag) (#693)
  • Harvesting GEMINI2 metadata records from OGC CSW servers (#566)


  • New API key header (#466)
  • Group metadata now revisioned (#231)

All tickets

Datapkg Release Notes

A major new release (v0.7) of datapkg is out!

There’s a quick getting started section below (also see the docs).

About the release

This release brings major new functionality to datapkg especially in regard to its integration with CKAN. datapkg now supports uploading as well as downloading and can now be easily extended via plugins. See the full changelog below for more details.

Get started fast

# 1. Install: (requires python and easy_install)
$ easy_install datapkg
# Or, if you don't like easy_install
$ pip install datapkg or even the raw source!

# 2. [optional] Take a look at the manual
$ datapkg man

# 3. Search for something
$ datapkg search ckan:// gold
gold-prices -- Gold Prices in London 1950-2008 (Monthly)

# 4. Get some data
# This will result in a csv file at /tmp/gold-prices/data
$ datapkg download ckan://gold-prices /tmp

# 5. Store some data
# Edit the gold prices csv making some corrections
$ cp gold-prices/data mynew.csv
$ edit mynew.csv
# Now upload back to storage
$ datapkg upload mynew.csv ckan://mybucket/ckan-gold-prices/mynew.csv

Find out more » — including how to create, register and distribute your own ‘data packages’.


  • MAJOR: Support for uploading datapkgs (
  • MAJOR: Much improved and extended documenation
  • MAJOR: New sqlite-based DB index giving support for a simple, central, ‘local’ index (ticket:360)
  • MAJOR: Make datapkg easily extendable

    • Support for adding new Index types with plugins
    • Support for adding new Commands with command plugins
    • Support for adding new Distributions with distribution plugins
  • Improved package download support (also now pluggable)

  • Reimplement url download using only python std lib (removing urlgrabber requirment and simplifying installation)
  • Improved spec: support for db type index + better documentation
  • Better configuration management (especially internally)
  • Reduce dependencies by removing usage of PasteScript and PasteDeploy
  • Various minor bugfixes and code improvements


A big hat-tip to Mike Chelen and Matthew Brett for beta-testing this release and to Will Waites for code contributions.

Get Updates