Support Us

You are browsing the archive for Featured Project.

We need open carbon emissions data now!

Jonathan Gray - May 13, 2013 in Access to Information, Campaigning, Featured, Featured Project, Open Data, Policy, WG Sustainability, Working Groups

Last week the average concentration of carbon dioxide in the atmosphere reached 400 parts per million, a level which is said to be unprecedented in human history.

Leading scientists and policy makers say that we should be aiming for no more than 350 parts per million to avoid catastrophic runaway climate change.

But what’s in a number? Why is the increase from 399 to 400 significant?

While the actual change is mainly symbolic (and some commentators have questioned whether we’re hovering above or just below 400), the real story is that we are badly failing to cut emissions fast enough.

Given the importance of this number, which represents humanity’s progress towards tackling one of the biggest challenges we currently face – the fact that it has been making the news around the world is very welcome indeed.

Why don’t we hear about the levels of carbon dioxide in the atmosphere from politicians or the press more often? While there are regularly headlines about inflation, interest and unemployment, numbers about carbon emissions rarely receive the level of attention that they deserve.

We want this to change. And we think that having more timely and more detailed information about carbon emissions is essential if we are to keep up pressure on the world’s governments and companies to make the cuts that the world needs.

As our Advisory Board member Hans Rosling puts it, carbon emissions should be on the world’s dashboard.

Over the coming months we are going to be planning and undertaking activities to advocate for the release of more timely and granular carbon emissions data. We are also going to be working with our global network to catalyse projects which use it to communicate the state of the world’s carbon emissions to the public.

If you’d like to join us, you can follow #OpenCO2 on Twitter or sign up to our open-sustainability mailing list:

Image credit: Match smoke by AMagill on Flickr. Released under Creative Commons Attribution license.

Announcing CKAN 2.0

Mark Wainwright - May 10, 2013 in CKAN, Featured, Featured Project, News, OKF Projects, Open Data, Open Government Data, Releases, Technical

CKAN is a powerful, open source, open data management platform, used by governments and organizations around the world to make large collections of data accessible, including the UK and US government open data portals.

Today we are very happy and excited to announce the final release of CKAN 2.0. This is the most significant piece of CKAN news since the project began, and represents months of hectic work by the team and other contributors since before the release of version 1.8 last October, and of the 2.0 beta in February. Thank you to the many CKAN users for your patience – we think you’ll agree it’s been worth the wait.

[Screenshot: Front page]

CKAN 2.0 is a significant improvement on 1.x versions for data users, programmers, and publishers. Enormous thanks are due to the many users, data publishers, and others in the data community, who have submitted comments, code contributions and bug reports, and helped to get CKAN to where it is. Thanks also to OKF clients who have supported bespoke work in various areas that has become part of the core code. These include, the US government open data portal, which will be re-launched using CKAN 2.0 in a few weeks. Let’s look at the main changes in version 2.0. If you are in a hurry to see it in action, head on over to, where you can try it out.


CKAN 2.0 introduces a new sleek default design, and easier theming to build custom sites. It has a completely redesigned authorisation system enabling different departments or bodies to control their own workflow. It has more built-in previews, and publishers can add custom previews for their favourite file types. News feeds and activity streams enable users to keep up with changes or new datasets in areas of interest. A new version of the API enables other applications to have full access to all the capabilities of CKAN. And there are many other smaller changes and bug fixes.

Design and theming

The first thing that previous CKAN users notice will be the greatly improved page design. For the first time, CKAN’s look and feel has been carefully designed from the ground up by experienced professionals in web and information design. This has affected not only the visual appearance but many aspects of the information architecture, from the ‘breadcrumb trail’ navigation on each page, to the appearance and position of buttons and links to make their function as transparent as possible.

[Screenshot: dataset page]

Under the surface, an even more radical change has affected how pages are themed in CKAN. Themes are implemented using templates, and the old templating system has been replaced with the newer and more flexible Jinja2. This makes it much easier for developers to theme their CKAN instance to fit in with the overall theme or branding of their web presence.

Authorisation and workflow: introducing CKAN ‘Organizations’

Another major change affects how users are authorised to create, publish and update datasets. In CKAN 1.x, authorisation was granted to individual users for each dataset. This could be augmented with a ‘publisher mode’ to provide group-level access to datasets. A greatly expanded version of this mode, called ‘Organizations’, is now the default system of authorisation in CKAN. This is much more in line with how most CKAN sites are actually used.

[Screenshot: Organizations page]

Organizations make it possible for individual departments, bodies, groups, etc, to publish their own data in CKAN, and to have control over their own publishing workflow. Different users can have different roles within an Organization, with different authorisations. Linked to this is the possibility for each dataset to have different statuses, reflecting their progress through the workflow, and to be public or private. In the default set-up, Organization user roles include Members (who can read the Organization’s private datsets), Editors (who can add, edit and publish datasets) and Admins (who can add and change roles for users).

More previews

In addition to the existing image previews and table, graph and map previews for spreadsheet data, CKAN 2.0 includes previews for PDF files (shown below), HTML (in an iframe), and JSON. Additionally there is a new plugin extension point that makes it possible to add custom previews for different data types, as described in this recent blog post.

[Screenshot: PDF preview]

News feeds and activity streams

CKAN 2.0 provides users with ways to see when new data or changes are made in areas that they are interested in. Users can ‘follow’ datasets, Organizations, or groups (curated collections of datasets). A user’s personalised dashboard includes a news feed showing activity from the followed items – new datasets, revised metadata and changes or additions to dataset resources. If there are entries in your news feed since you last read it, a small flag shows the number of new items, and you can opt to receive notifications of them via e-mail.

Each dataset, Organization etc also has an ‘activity stream’, enabling users to see a summary of its recent history.

[Screenshot: News feed]

Programming with CKAN: meet version 3 of the API

CKAN’s powerful application programming interface (API) makes it possible for other machines and programs to automatically read, search and update datasets. CKAN’s API was previously designed according to REST principles. RESTful APIs are deservedly popular as a way to expose a clean interface to certain views on a collection of data. However, for CKAN we felt it would be better to give applications full access to CKAN’s own internal machinery.

A new version of the API – version 3 – trialled in beta in CKAN 1.8, replaced the REST design with remote procedure calls, enabling applications or programmers to call the same procedures as CKAN’s own code uses to implement its user interface. Anything that is possible via the user interface, and a good deal more, is therefore possible through the API. This proved popular and stable, and so, with minor tweaks, it is now the recommended API. Old versions of the API will continue to be provided for backward compatibility.

Documentation, documentation, documentation

CKAN comes with installation and administration documentation which we try to keep complete and up-to-date. The major changes in the rest of CKAN have thus required a similarly concerted effort on the documentation. It’s great when we hear that others have implemented their own installation of CKAN, something that’s been increasing lately, and we hope to see even more of this. The docs have therefore been overhauled for 2.0. CKAN is a large and complex system to deploy and work on improving the docs continues: version 2.1 will be another step forward. Where people do run into problems, help remains available as usual on the community mailing lists.

… And more

There are many other minor changes and bug fixes in CKAN 2.0. For a full list, see the CKAN changelog.


To install your own CKAN, or to upgrade an existing installation, you can install it as a package on Ubuntu 12.04 or do a source installation. Full installation and configuration instructions are at

Try it out

You can try out the main features at Please let us know what you think!

LobbyPlag – Who is really writing the law?

Martin Virtel - March 22, 2013 in Featured Project, Open Government Data

Sometimes, the band continues to play because the audience is enjoying the music so much. This is pretty much what happened to Lobbyplag. Our plan was to drive home a single point that outraged us: Some Members of the European Parliament were taking law proposals verbatim from lobbyists and trying to slip them into the upcoming EU privacy law. They actually copy-and-pasted texts provided by the likes of Amazon, Google, Facebook or some banking industry body. The fact itself was Max Schrems’ discovery. Max is a lawyer, and he sought the help of Richard Gutjahr and the data journalists and developers from OpenDataCity – to present his evidence to the public in form of a website called Lobbyplag. The name evokes memories of past projects where people had hunted down plagiarism in the doctoral theses of German politicians.

Lobbyplag – discover the copy&paste politicians from Martin Virtel on Vimeo.

A lovestorm of reactions ensued, not only from the usual consumer privacy advocates. The site struck a chord among lobbying-stressed lawmakers and outraged citizens alike. Wolfgang Thierse, the president of the German Parliament, called it “a meritorious endeavor”, and two European lawmakers pledged to disclose their sources. People started proposing other laws to look at, started sending us papers from lobbyists, and offered their help for finding more lobby-plagiarizing politicians.

What had happened? Looking into the details of Privacy Law is not normally a crowd-pleaser, and like most laws this one was being made out sight, watched over only by a few specialists. This is the norm especially for the EU parliament, which still doesn’t attract a level of public attention and scrutiny to match its real power. There had already been a lot of reports about the intense lobbying against the Privacy Law.

Lobbyplag made a difference because Lobbyplag set a different tone. We simply presented the proof of what was being done behind closed doors – and gave people the power to look it up for themselves. And they did. And they liked it. And asked for more.


At that point, we decided that this was to be more than a single issue website, this was a public utility in the making. We successfully completed a 8000€ crowdfunding campaign at, a fledgling German platform, and we are now building the tools that interested citizens (assisted by algorithms) will need to make the comparisons between lobbyist texts and law amendments, and draw the conclusions by themselves. Stefan’s Parltrack project, which provides APIs to the European Parliament’s paperwork, will provide the foundation, as it did for the first iteration of lobbyplag, and we’re looking at using the Open Knowledge Foundation’s pybossa, a microtasking framework (you can see it in action at

Of course, the first round of money is only a start – we’re a team of volunteers – so we also submitted Lobbyplag to the Knight News Challenge, which this year fittingly is looking to support projects that improve the way citizens and governments interact – you can read more about the proposal and provide feedback on the Knight News page.

We think that making comparisons easy and bringing lobbying out into the light is a way to achieve that. There’s nothing inherently wrong with lawmakers relying on experts when they’re not experts themselves – you’d expect them to. But if they hide who they’ve been listening to, and if they only listen to one side, they contribute towards public distrust in their profession. Making the process of lawmaking and influencing lawmakers more transparent will result in better debate, better understanding and better laws.

There’s a saying that “Laws, like sausages, cease to inspire respect in proportion as we know how they are made” – but we think that is not true any longer. Citizens all over the world are not really willing to respect lawmakers unless they can trace what they are stuffing in there.

The Biggest Failure of Open Data in Government

Philip Ashlock - March 15, 2013 in Featured Project, Open Government Data

Many open data initiatives forget to include the basic facts about the government itself

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

Census Dotmap by Brandon Martin-Anderson

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

It’s difficult enough to come by comprehensive information about local government, so there definitely aren’t many surveys that help convey this problem, but you can start to get the idea from a pretty high level. Studies have shown that only about two thirds of Americans can name their governor (Pew 2007) while less than half can name even one of their senators (Social Capital Community Survey 2006). This excerpt from Andrew Romano in Newsweek captures the problem well:

Most experts agree that the relative complexity of the U.S. political system makes it hard for Americans to keep up. In many European countries, parliaments have proportional representation, and the majority party rules without having to “share power with a lot of subnational governments,” notes Yale political scientist Jacob Hacker, coauthor of Winner-Take-All Politics. In contrast, we’re saddled with a nonproportional Senate; a tangle of state, local, and federal bureaucracies; and near-constant elections for every imaginable office (judge, sheriff, school-board member, and so on). “Nobody is competent to understand it all, which you realize every time you vote,” says Michael Schudson, author of The Good Citizen. “You know you’re going to come up short, and that discourages you from learning more.”

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program. Even the cities that have been very progressive with open data like Washington DC and New York neglect to publish basic information like the names and contact details of their city councilmembers as raw open data. The NYC Green Book was finally posted online last year, but it’s still not available as raw data. Even in the broader open data and open government community, this information doesn’t get much attention. The basic contact details for government offices and elected officials were not part of the Open Data Census and neither were jurisdiction boundaries for government districts.

Fortunately, a number of projects have started working to address this discrepancy. In the UK, there’s already been great progress with websites like OpenlyLocal, TheyWorkForYou and MapIt, but similar efforts in North America are much more nascent. OpenNorth Represent has quickly become the most comprehensive database of Canadian elected officials with data that covers about half the population and boundary data that covers nearly two thirds. In the US, the OpenStates project has made huge progress in providing comprehensive coverage of the roughly 7,500 state legislators across the country while the Voting Information Project has started to provide comprehensive open data on where to vote and what’s on the ballot – some of the most essential yet most elusive data in our democracy. Most recently, DemocracyMap has been digging in at the local level, building off the data from the OpenStates API and the Sunlight Congress API and deploying an arsenal of web scrapers to provide the most comprehensive open dataset of elected officials and government boundaries in the US. The DemocracyMap API currently includes over 100,000 local officials, but it still needs a lot more data for complete coverage. In order to scale, many of these projects have taken an open source community-driven approach where volunteers are able to contribute scrapers to unlock more data, but many of us have also come to realize that we need data standards so we can work together better and so our governments can publish data the right way from the start.

James McKinney from OpenNorth has already put a lot of work into the Popolo Project, an initial draft of data standards to cover some of the most basic information about government like people and their offices. More recently James also started a W3C Open Government Community Group to help develop these standards with others working in this field. In the coming months I hope to see a greater convergence of these efforts so we can agree on basic standards and begin to establish a common infrastructure for defining and discovering who and what our government is. Imagine an atlas for navigating the political geography of the world from the international offices to those in the smallest neighborhood councils.

This is a problem that is so basic that most people are shocked when they realize it hasn’t been solved yet. It’s one of the most myopic aspects of the open government movement. Fortunately we are now making significant progress, but we need all the support we can get: scraping more data, establishing standards, and convincing folks like the Secretaries of State in many US States that we need to publish all boundaries and basic government contact information as open data. If you’re starting a new open data program, please don’t forget about the basics!

DemocracyMap is a submission for the Knight News Challenge. You can read the full proposal and provide feedback on the Knight News Challenge page.

An Open Knowledge Platform on Building Energy Performance to Mitigate Climate Change

Anne-Claire Bellec and Martin Kaltenböck - March 14, 2013 in Featured Project, Open Data, WG Sustainability

Buildings account for more than 30% of final energy use and energy-related carbon emissions in the world today. This sector
has the potential to play a crucial role in mitigating the global challenge of climate change. However, the building industry is a
local industry and the sector is fragmented at all levels, from planning to design and practical construction and over its various
technical aspects.

In this context, how best to help the sector deliver its global mitigation potential? Our answer at the Global Buildings
Performance Network
(GBPN) is collaboration: stimulating collective knowledge and analysis from experts and building
professionals worldwide to advance best building performance policies and solutions that can support better decision-making.
At the cornerstone of this strategy is our new Linked Open Data website launched on the 21st of February. This web-based tool
is unique in that it has been designed as a global participatory open data knowledge hub: harvesting, curating and creating
global best knowledge and data on building performance policies.

As the energy performance of buildings becomes central to any effective strategy to mitigate climate change, policymakers,
investors and project developers, members of governmental institutions and multilateral organisations need better access to
building performance data and knowledge to design, evaluate and compare policies and programmes from around the world.

The GBPN encourages transparent availability and access to reliable data. The GBPN data can be freely used, reused
and redistributed by anyone (as provided under a Creative Commons Attribution CC-BY 3.0 FR license.) – subject to the
requirement to attribute and share alike. In addition, the GBPN Knowledge Platform has been developed making use of Linked
Open Data technology and principles to connect with the best online resources. The GBPN Glossary is linked to DBpedia as
well as the reegle’s Clean Energy and Climate Change Thesaurus developed by the Renewable Energy and Energy Efficiency
(REEEP) and (REN21). A “News Aggregator Tool” service is also available. And our
platform connects to our Regional Hubs data portals:, the open data portal for energy efficiency in European
buildings developed by the Buildings Performance Institute Europe (BPIE), and, the leading online tool for sharing global best practices on building rating and disclosure policies launched by the Institute for Market Transformation (IMT) in 2011.

One of the main features of the website is the “Policy Comparative Tool” enabling comparison of the world’s best practice
policies for new buildings. By understanding how countries have designed and implemented best practice codes, policy makers
can use this information to strengthen the future design of dynamic policies. The tool provides interactive data visualization
and analytics.

The GBPN aims to facilitate new synergies with energy efficiency experts and building professionals worldwide. For this
purpose, the new website offers a Laboratory, a participatory research collaboration tool for building energy efficiency experts
to share information and generate new knowledge on how best to develop ambitious building energy performance policies

The GBPN will be enriching its data over time with additional topics and information generated through data exchange
projects and research partnerships and is inviting any interested organisations to suggest any opportunities for collaboration.

The GBPN Open Knowledge Platform has been developed together with the Semantic Web Company, a consulting company
and technology provider providing semantic information management solutions with a strong focus on Open Data and Linked
Open Data principles and technologies.

About the GBPN:

The Global Buildings Performance Network (GBPN) is a globally organised and regionally focused network whose mission is to
advance best practice policies that can significantly reduce energy consumption and associated CO2 emissions from buildings.
We operate a Global Centre based in Paris and are represented by Hubs and Partners in four regions: China, India, Europe and
the United States. By promoting building energy performance globally, we strive to tackle climate change while contributing to the planet’s economic and social wellbeing.

Follow us on Twitter @GBPNetwork
Contact us at –

Document Freedom Day 2013

Erik Albers - March 12, 2013 in Events, Featured Project, Open Standards

What is document freedom?

Have you ever been stuck with some data that you
have not been able to open because it was in a format that needs some
specific kind of software to open it? The same thing
happens tens of thousands of times each day. Can you imagine how much
knowledge exchange doesn’t happen just because sender and receiver
(intentionally or not) are using different data formats? Can you imagine how
much knowledge future generations will lose if we keep on using proprietary,
closed data formats that one day no one will ever be able to open because
the company behind it had business secrets and patents on it but then went

Open Standards, on the other hand, are data formats that have an open
documentation and everyone is free to use or implement in their own
software. The first characteristic (open documentation) guarantees that now
and even in a hundred of years everybody interested can understand the data
format and read it. The second characteristic (free to use) guarantees that
now and in even in a hundred years everybody is free to write some piece of
software to give everyone else the ability to read a specific piece of data.
That is why everyone and every public institution should be using Open Standards.

This is exactly the point where our document freedom campaign comes in.
Every year on the last Wednesday of March, the Free Software Foundation
runs a global campaign that is called
“Document Freedom Day”. The aim of the campaign
is to raise awareness of the usefulness of Open Standards. Therefore we
encourage local groups to organise an event that highlights the
importance of the use of Open Standards. Last year there were more than
50 events in more than 20 countries
. This year, Document Freedom Day
(DFD) will be on the 27th of March 2013.

The most important part of the whole campaign is done by guys like you and
me! In order to celebrate information accessibility and Open Standards, we
heavily depend on local activity on public places, in universities, in
hackerspaces or everywhere you can imagine. I am pretty sure that you have
very good ideas what you can do to raise some attention.

If you are interested, please have a look at some ideas of what you can
and feel free to support your event with our promotion material
that you can order for no cost. You can order the material on the webpage. Finally, if you are planning some activity, don’t
forget to register your event on our events page.

Thank you very much for your attention.
Your participation in Document Freedom Day can make the difference!

Images: Last year’s audience in Jakarta; DFD around the world; Document Freedom Day in Rio de Janeiro. All CC-BY-SA

Opening Product Data for a more responsible world

Philippe Plagnol - March 8, 2013 in Featured Project, Open Data

Data on the products we buy is rarely viewed as something to be opened. But in fact, the international standards that make it possible for products to be traded across borders can be used by consumers for their own ends – to help improve information-sharing and choice across the planet. There is currently no public database of this information – but we’re working to change that at Product Open Data.

Eugène Delacroix, “la liberté guidant le peuple”, 1830 – redesigned by Jessica Dere

Opening Product Data

When a consumer buys a product he gives power to a manufacturer, enabling it to continue or to extend its activities. A public worldwide product database would allow consumers to get information in real time, by scanning the barcode with a mobile phone, or to publish their opinions about specific products in a way that others can easily access. The consumer would have the tools to make decisions based on their own concerns about health, nutrition, ecology, or human rights, and to make ethical, dietary or value-based purchases.

GS1 is a worldwide organization which assign to a product a unique code that people can see below the barcode (called the GTIN code). There are billions of product commercialized in the world, and the full GTIN code list is stored only in GS1 database. The objective of POD (Product Open Data) is to open product data by gathering these key codes, and collecting product information from the manufacturer by creating a new RSS standard around this data (called PSS – Product Simple Syndication).

The POD database contains currently 1.4 million products. The most difficult task is to assign to each product a classification GPC code, which carries information about the particular type of product that it is. GPC codes are an international standard – GS1 has already assigned 10 million of them – but many e-commerce sites have developed their own taxonomies, which makes it difficult to compare product-types across sellers and to find the correct GPC codes online. Other challenges are finding information like the brand, dimensions, and packaging, and lastly but crucially, to guarantee the quality of data. The database and pictures are free to access.

Why is this important?

There are a whole load of reasons why opening product data is a really important step:

  • WIth the GTIN Code as a unique identifier, consumers will be able to communicate about a specific product across the world.

  • Almost all manufacturers around the world are covered by GS1, which is focused on supply chain. By developing an open database, a new organization with the same power will be created as a counterpoint, but focusing on consumers’ right

  • Organizations dealing with health, ecology, and human rights will be able to provide their own criteria about products very easily using the GTIN Code.

  • Individuals will be able to raise a risk or an alert about a product. A set of rules will have to be defined to avoid buzz triggers with wrong information.

  • Marketing and commerce will change a lot because consumers will have new inputs to decide what to buy (e-reputation)

  • Smartphone apps and a community will build around product knowledge.

Whether you’re interested in open source and open data, the protection of consumers, or the protection of the environment, we’d love to hear from you. Together we can join forces in an innovative project which is good for our planet.

Keeping track of the European Parliament

Theodora Middleton - March 1, 2013 in Featured Project, Open Government Data

The following guest post is by Stef.

European Union legislation: In whose interest?

Brussels is a globally important policy-making center.
The European single market is advanced and huge, with industry interests competing with national politics and
NGO values.

Policy negotiations at this level attract powerful interests. The current Data Protection Regulation, for example, brings in lobbying from lots of companies who deal
with our data, such as online services and banking. A German initiative,, is mapping the amendments to this regulation back to the
industry and NGO proposals which provoked them, illustrating the importance
of clearly attributing any changes to the interest groups pushing them.

If you want to be an informed citizen or NGO you have to track the
whole law-making process, with key actors (e.g. responsible and shadow
rapporteurs), documents (like amendments and other supporting
documents), votes and milestones. The European Parliament publishes a lot of this information, but it is dispersed over a huge bureaucratic machinery that hoards it inside the institution. People working on the
inside have much better and earlier access to details, but connecting
these details across departments is complex and time-consuming. Attempting to be an informed citizen from the outside is, especially if you don’t know where to look, even more complicated.

The asymmetry of resources causes a bias against citizen interests, but to
some degree this can be offset by the innovative usage of technology and the
internet. There are legends of a bot called Knecht written in Lisp – an arcane and powerful language – that helped the activists keep track of the Software Patent Directive at the turn of the 21st century. My project, Parltrack now enables the same for all dossiers making their way through Parliament.

What can the European Parliament do?

There are rumors that the Parliament is close to releasing its its first free
software project, AT4AM a very nice tool for editing and
handling amendments. This is fantastic news, however whether the underlying data
(which is, according to rumors, cleaned by a half-dozen staffers) will
also be released remains unclear.

The license under which the Parliament releases its data is quite
close to a basic attribution license, so on this the European Parliament deserves
applause. However the data is mostly released in formats which make it difficult to extract information, such as Word, PDF and HTML. It can be automatized, but
the process is error-prone due to many human errors like typos in the
documents. Automated extraction is also
very wasteful on the resources of both the European Parliament and the extractor, so it
would be better if the data were also published:

  1. as an Application Programming Interface (API) so that simple
    smartphone apps can get instant updates.

  2. as Bulk Database dump downloads, so that external developers can reuse
    the data without wasteful scraping. Using Bit Torrent as a
    distribution medium, the bandwidth costs can be cut down and public
    money saved.

  3. with Daily or more granular updates, to eliminate the need to download the
    complete database dumps daily.

  4. with digital signatures, to assure trust in the data.

  5. under a requirement to share-alike, as well as attribute, to reduce the resource-bias of large corporations.

Parltrack needs your help

Even though such measures would cut down costs and create more jobs, they are not likely to be enacted any time soon. Until then Parltrack tries
to liberate the most relevant data and provide a dashboard for
activists and concerned citizens. It has been used during the last two
years in the campaign against ACTA and other digital rights related
issues, and has received much praise from activists and European NGOs.

The further development of Parltrack depends heavily on funding. Features like

  • commenting and rating of amendments, and full law proposals,
  • monitoring of certain subjects and search phrases,
  • localization in the other languages of the member states,
  • visitor and search trends,
  • an improved presentation of the historical data captured and
  • automatic analysis and prediction of the law making process based
    on the huge statistical value encapsulated in the database,

are just a few exciting possibilities that are on the road map.

Unfortunately building free infrastructure and defending it does not
pay as well as selling it. Thus I have been somewhat distracted recently
from tending to Parltrack. However the data liberated by Parltrack and
the current feature set are only the tip of a very exciting iceberg,
so to be able to continue the Parltrack development and to keep it
free I started a crowd-funding campaign. Please have a look and
support further development of Parltrack so it can restore some power
to citizens.

The Open Data Census – Tracking the State of Open Data Around the World

Rufus Pollock - February 20, 2013 in Events, Featured, Featured Project, Open Data, Open Government Data, Our Work, WG Open Government Data

Recent years have seen a huge expansion in open data activity around the world. This is very welcome, but at the same time it is now increasingly difficult to assess if, and where, progress is being made.

To address this, we started the Open Data Census in order to track the state of open data globally. The results so far, covering more than 35 countries and 200 datasets, are now available online at We’ll be building this up even more during Open Data Day this weekend.

This post explains why we started the census and why this matters now. This includes the importance of quality (not just quantity) of data, the state of the census so far, and some immediate next steps – such as expanding the census to the city level and developing an “open data index” to give a single measure of open data progress.

Why the Census?

In the last few years there has been an explosion of activity around open data and especially open government data. Following initiatives like and, numerous local, regional and national bodies have started open government data initiatives and created open data portals (from a handful 3 years ago there are now more than 250 open data catalogs worldwide).

But simply putting a few spreadsheets online under an open license is obviously not enough. Doing open government data well depends on releasing key datasets in the right way. Moreover, with the proliferation of sites it has become increasingly hard to track what is happening.

Which countries, or municipalities, are actually releasing open data and which aren’t?1 Which countries are making progress on releasing data on stuff that matters in the right way?

Quality not (just) Quantity

Progress in open government data is not (just) about the number of datasets being released. The quality of the datasets being released matters at least as much – and often more – than the quantity of these datasets.

We want to know whether governments around the world are releasing key datasets, for example critical information about public finances, locations and public transport rather than less critical information such as the location of park benches or the number of streetlights per capita.2

Similarly, is the data being released in a form that is comparable and interoperable or is it being release as randomly structured spreadsheets (or, worse, non-machine-readable PDFs)?

Tables like this are easy for humans, but difficult for machines.

This example of a table from US Bureau of Labor Statistics are easy for humans to interpret but very difficult for machines. (But at least it’s in plain text not PDF).)

The essential point here is that it is about quality as much quantity. Datasets aren’t all the same, whether in size, importance or format.

Enter the Census

And so was born the Open Knowledge Foundation’s Open Data Census – a community-driven effort to map and evaluate the progress of open data and open data initiatives around the world.

We launched the first round of data collection last April at the meeting of the Open Government Partnership in Brazil. Since then members of the Open Knowledge Foundation’s Open Government Data Working Group have been continuing to collect the data and our Labs team have been developing a site to host the census and present its results.

ogd census table

The central part of the census is an assessment based on 10 key datasets.

These were selected through a process of discussion and consultation with the Open Government Data Working Group and will likely be expanded in future (see some great suggestions from David Eaves last year). We’ll also be considering additional criteria: for example whether data is being released in a standard format that facilitates integration and reuse.

We focused on a specific list of core datasets (rather than e.g. counting numbers of open datasets) for a few important reasons:

  • Comparability: by assessing against the same datasets we would be able to compare across countries
  • Importance: Some datasets are more important than others and by specifically selecting a small set of key datasets we could make that explicit
  • Ranking: we want, ultimately, to be able to rank countries in an “Open Data Index”. This is much easier if we have a good list of cross-country comparable data. 3

Today, thanks to submissions from more than thirty contributors the census includes information on more 190 datasets from more than 35 countries around the world and we hope to get close to full coverage for more than 50 countries in the next couple of months.

ogd census map

The Open Data Index: a Scoreboard for Open Government Data

Having the census allows us to evaluate general progress on open data. But having a lot of information alone is not enough. We need to ensure the information is presented in a simple and understandable way especially if we want it to help drive improvements in the state of open government data around the world.

Inspired by work such as Open Budget Index from the International Budget Partnership, the Aid Transparency Index from Publish What You Fund, the Corruption Perception Index from Transparency International and many more, we felt a key aspect is to distill the results into a single overall ranking and present this clearly. (We’ve also been talking here with the great folks at the Web Foundation, who are also thinking about an Open Data Index connected with their work on the Web Index).

obp screenshot

As part of our first work on the Census dashboard last September for OKFestival we did some work on an “open data index”, which provided an overall rankings for countries. However, during that work, it became clear that building a proper index requires some careful thought. In particular, we probably wanted to incorporate other factors than just the pure census results, for example:

  • Some measure of the number of open datasets (appropriately calibrated!)
  • Whether the country has an open government data initiative and open data portal
  • Whether the country has joined the OGP
  • Existence (and quality) of an FoI law

In addition, there is the challenging question of weightings – not only between these additional factors and census scores but also for scoring the census. Should, for example, Belarus be scoring 5 or 6 out of 7 on the census despite it not being clear whether any data is actually openly licensed? How should we weight total number of datasets against the census score?

Nevertheless, we’re continuing to work on putting together an “open data index” and we hope to have an “alpha” version ready for the open government data community to use and critique within the next few months. (If you’re interested in contributing check out the details at the end of this post).

The City Census

The first version of the census was country oriented. But much of the action around open data happens at the city and regional level, and information about the area around us tends to be the most meaningful and important.

We’re happy to say plans are afoot to make this happen!

Specifically, we’ll be kicking off the city census with an Open Data Census Challenge this Saturday as part of Open Data Day.

If the Open Data Census has caught your interest, you are invited to become an Open Data Detective for a day and help locate open (and closed) datasets in cities around the world. Find out more and sign up here:

Get Involved

Interested in the Open Data Census? Want to contribute? There are a variety of ways:


  1. For example, we’ve seen several open data initiatives releasing data under non-open licenses that restrict, for example, derivative works, redistribution or commercial use. 

  2. This isn’t to say that less critical information isn’t important – one of the key reasons for releasing material openly is that you never know who may derive benefit from it, and the “long tail of data” may yield plenty of unexpected riches. 

  3. Other metrics, such as numbers of datasets are very difficult to compare – what is a single dataset in one country can easily become a 100 or more in another country, for example unemployment could be in a single dataset or split into many datasets one for each month and region). 

Version Variation Visualisation

Tom Cheesman - February 8, 2013 in Featured Project, Open Content, Public Domain, WG Linguistics

In 2010, I had a long paper about the history of German translations of Othello rejected by a prestigious journal. The reviewer wrote: “The Shakespeare Industry doesn’t need more information about world Shakespeare. We need navigational aids.”

About the same time, David Berry turned me on to Digital Humanities. I got a team together (credits) and we’ve built some cool new tools.

All culturally important works are translated over and over again. The differences are interesting. Different versions of Othello reflect changing, contested ideas about race, gender and sexuality, political power, and so on, over the
centuries, right up to the present day. Hence any one translation is just one snapshot from its local place and moment in time, just one
interpretation, and what’s interesting and productive is the variation, the diversity.

But with print culture tools, you need a superhuman memory, a huge desk and ideally several assistants, to leaf backwards and forwards in all the copies, so you can compare and contrast. And when you present your findings, the minutiae of differences can be boring, and your findings can’t be verified. How do you know I haven’t just picked quotations that support my argument?

But with digital tools, multiple translations become material people can easily work and play with, research and create with, and we can begin to use them in totally new ways.

Recent work

vvv screenshot2
We’ve had funding from UK research councils and Swansea University to digitize 37 German versions of Othello (1766-2010) and build these prototype tools. There you can try out our purpose-built database and tools for freely segmenting and aligning multiple versions; our timemap of versions; our parallel text navigation tool which uses aligned segment attributes for navigation; and most innovative of all: the tool we call ‘Eddy and Viv’. This lets you compare all the different translations of any
segment (with help from machine translation), and it also lets you read the whole translated text in a new way, though the variety of translations. You don’t need to know the translating language.

This is a radical new idea (more details on our platform). Eddy and Viv are algorithms: Eddy calculates how much each translation of each segment differs in wording from others, then Viv maps the variation in the results of that analysis back onto the translated text segments.

This means you can now read Shakespeare in English, while seeing how much all the translators disagree about how to interpret each speech or line, or even each word. It’s a new way of reading a literary work through translators’ collective eyes, identifying hotspots of
variation. And you don’t have to be a linguist.

Future plans and possible application to collections of public domain texts

I am a linguist, so I’m interested in finding new ways to overcome language barriers, but equally I’m interested in getting people interested in learning languages. Eddy and Viv have that double effect. So these are not just research tools: we want to make a cultural difference.

We’re applying for further funding. We envisage an open library of versions of all sorts of works, and a toolsuite supporting experimental and collaborative approaches to understanding the differences, using visualizations for navigation, exploration and
comparison, and creating presentations for research and education.

The tools will work with any languages, any kinds of text. The scope is vast, from fairy tales to philosophical classics. You can also investigate versions in just one language – say, different editions of an encyclopedia, or different versions of a song lyric. It should be possible to push the approach beyond text, to audio and video, too.

Shakespeare is a good starting point, because the translations are so numerous, increasing all the time, and the differences are so intriguing. But a few people have started testing our tools on other materials, such as Jan Rybicki with
Polish translations of Joseph Conrad’s work. If we can demonstrate the value, and simplify
the tasks involved, people will start on other ‘great works’ – Aristotle, Buddhist scripture,
Confucius, Dante (as in Caroline Bergvall’s amazing sound work ‘Via’), Dostoyevski,

Many translations of transculturally important works are in the public domain. Most are
not, yet. So copyright is a key issue for us. We hope that as the project grows,
more copyright owners will be willing to grant more access. And of course
we support reducing copyright restrictions.

Tim Hutchings, who works on digital scriptures, asked me recently: “Would it be possible
to create a platform that allowed non-linguist readers to appreciate the differences in tone
and meaning between versions in different languages? … without needing to be fluent in
all of those languages.” – Why not, with imaginative combinations of various digital tools
for machine translation, linguistic analysis, sentiment analysis, visualization and not least:
connecting people.

Get Updates