Support Us

You are browsing the archive for Open Government Data.

New Open data hub from OKFN Greece

February 14, 2013 in CKAN, OKF Greece, Open Data, Open Government Data

Opening up public sector data is becoming a top priority for governments throughout Europe and North America. We are pleased to announce the launch of the new Greek open data hub, developed and hosted by OKFN Greece. The data hub integrates the Open Knowledge Foundation’s open source data cataloging software CKAN, which is also the basis of the UK, the European and the US portals.

1BwTJWO4kyf66rdIzFVGfIUk90yszxF34CatgEg

Open data can be used in smart city services, financial monitoring, decision support systems and numerous other applications. The problem is finding them. Supposing you wanted to make a shiny new smartphone app, requiring a combination of geospatial data, some cultural facts and a photo collection. You know this data does exist, but you are also aware that you are going to have a hard time finding their providers, discovering their outgoing links and their license. All of this involves a significant investment of time.

Ordinary citizens, too, are made to invest precious time hunting down and combining data, such as the location of the nearest Job Centre, plus information on how to get there by public transport.

This is why we need data hubs where publishers can use, promote, and advertise all their datasets together. Citizens will also catalog a dataset if it is useful to them and maybe to others. Once the datasets reach a critical level, links between them are discovered and developed, multiplying the value of the datasets and dynamically increasing their significance. Combine this with live data previews, a smart search system and a powerful API and you have taken open data to the next level.

The Greek open data hub includes:

  1. The Open Data repository (http://ckan.okfn.gr). This section of the site is built using the CKAN platform (like the EU & UK sites).
  2. Examples of applications using Greek linked open data, like Greek DBpedia (DayLikeToday, DBpedia game) and visualizations with data from the Clarity Program, the municipalities etc.
  3. A live demo where anybody will be able to submit a SPARQL query and chart its results with Google Chart Editor.
  4. Information about the Greek Linked Open Data cloud – a visual network representation of the Greek Linked Open Data Cloud. OKFN Greece is constantly working on making this one huge!

Find out how you can use the hub, contribute to it, and get involved on our blog!

Open Government Datavis Competition

February 12, 2013 in Open Government Data

Screen Shot 2013-02-12 at 11.48.08

The Guardian Data Blog and Google are teaming up to find the best open government datavis out there.

There is a top prize of $2,000 on offer for the best visualisation of open government data. The Open Knowledge Foundation will be helping to judge the competition and we want to see imaginative, clear and beautiful visualisations that give a unique perspective on an open government data set of your choice.

Use existing data visualisation tools or develop your own new one. You need not be a developer in order to enter, the most important thing is that the data you have chosen to visualise is approached in an interesting and compelling way.

The competition is open to citizens of the UK, US, France, Germany, Spain, Netherlands, Sweden. The winner will take home $2,000 and the result will be published on the Guardian Datastore on the Show and Tell site as well as this blog.

Use our Data Catalogs website resource to find open government data to get you started, but of course feel free to bring your own data to the party. The most important thing is that all data used conforms to the Open Definition.

How Do I enter?

To enter fill in the form over on the Guardian’s website – feel free to ask questions about the competition via datavisualisation@guardian.co.uk.

The competition closes on 2nd April 2013

US government to release open data using OKF’s CKAN platform

February 1, 2013 in CKAN, News, Open Geodata, Open Government Data

You may have seen hints of it before, but the US government data portal, data.gov, has just announced officially that its next iteration – “data.gov 2.0″ – will incorporate CKAN, the open-source data management system whose development is led and co-ordinated by the Open Knowledge Foundation. The OKF itself is one of the organisations helping to implement the upgrade.

Like all governments, the US collects vast amounts of data in the course of its work. Because of its commitment to Open Data tens of thousands of datasets are openly published through data.gov. The new-look data.gov will be a major enhancement, and will for the first time bring together geospatial data with other kinds of data in one place.

CKAN is fast becoming an industry standard, and the US will become the latest to benefit from its powerful user interface for searching and browsing, rich metadata support, harvesting systems to help ingest data from existing government IT systems, and machine interface, helping developers to find and re-use the data. The partnership is also excellent news for CKAN, which is being improved with enhancements to its features for ingesting and handling geodata.

As it happens, CKAN itself is also moving towards a version 2.0. In fact, after months of hard work, the beta-version of CKAN 2.0 will hopefully be released in a couple of weeks. To keep up to date with developments, follow the CKAN blog or follow @CKANproject on Twitter.

First #OpenDataEDB of 2013

January 30, 2013 in Meetups, OKScotland, Open Data, Open GLAM, Open Government Data

The Edinburgh Open Data community started the year in fine style with a meet-up hosted by the National Library of Scotland on George IV Bridge. The turn-out was excellent, with a wide range of participants. As usual, we had a number of lightening talks.

The meet-up started with a welcome from Darryl Mead, Deputy National Librarian, who pointed out that openness was at the core of the NLS mission, and that work was underway to make information about the holdings easily accessible.

Amy Guy reported on her visit to the 1st International Open Data Dialog in Berlin, 5-6 December 2012. She was impressed by how successful the event was in demonstrating that Open Data is of practical value right now, rather than in some indeterminate future. Amy has a detailed blog post about the event.

Freda O’Byrne emphasised that small voluntary organisations (such as Play-Base,  Duddingston Field Group, and Scatterbox Films) can be hugely helped by access to the right kind of data, particularly when they need to write a case for further funding or when they are trying to network with other relevant organisations.

Recent developments in the approach to Open Data by the Scottish Government were described by Ben Plouviez (Head of Knowledge Information and Records Management). Some of the main challenges stem from cultural attitudes to data within the civil service; the cost of publishing open data on a sustainable basis; and the development of technical infrastructure such as URI sets. Areas where we can expect to see progress include increased sharing of data between different public institutions within Scotland; publishing dynamic datasets rather than isolated snapshots; and a better appreciation of the value of data analytics by managers within the Scottish public sector.

Expanding on Darryl’s introduction, Gill Hamilton described recent initiatives in Openness at NLS, including plans to appoint a Wikipedian in Residence, and the release of metadata for digital resources as Linked Open Data. Another issue under debate is whether it would be possible for NLS to provide open access to the digital resources themselves with loss of revenue.

Andy Wightman described current obstacles to answering the question “Who owns Scotland?“, highlighting the fact that members of the public are currently unable to view access information about land registration held by the Registers of Scotland without paying a fee. He had argued (unsuccessfully) during the course of the Land Registration etc. (Scotland) Act 2012, that access should be free (fee income accounts for only 5.3% of the Register’s revenue.) The wider debate about land taxation and land reform is hampered by the inadequate public availability of data on landownership.

It seemed as though lots of new connections were being made during the networking parts of the event, and some new collaborations were being hatched, possibly including a pilot project involving Scotland’s iconic Forth Rail Bridge.

Elevation and Plan drawing of the Forth Bridge, published within the Westhofen article on the construction of the Forth Bridge in Engineering, 1890, ©RCAHMS

Elevation and Plan drawing of the Forth Bridge, 1890, ©RCAHMS

The level of activity around Open Data in Scotland is definitely on the rise. A lot of events and initiatives are being planned, including the following:

Andrew Stott joins OKFN Advisory Board

January 24, 2013 in Open Data, Open Government Data, WG Open Government Data, Working Groups

We’re very pleased to announce that Andrew Stott, the UK’s former Director for Transparency and Digital Engagement who pioneered data.gov.uk, has joined the Open Knowledge Foundation’s Advisory Board.

For those of you who aren’t familiar with him already from our events or from our open-government mailing list, here’s a brief bio:

Andrew Stott was the UK’s first Director for Transparency and Digital Engagement. He led the work to open government data and create “data.gov.uk”; and after the 2010 Election he led the policy development and implementation of the new Government’s commitments on Transparency of central and local government. Following his formal retirement in December 2010 he was appointed to the UK Transparency Board to continue to advise UK Ministers on open data and e-government policy. He also advises other governments on Open Data both bilaterally and through the World Bank and the World Wide Web Foundation. He is an expert adviser on Open Data strategy to the EU Citadel On The Move programme and co-chairs the OKFN Open Government Data Working Group.

Andrew has extensive knowledge – from the inside – about the challenges and obstacles to opening up government data and how to overcome them (for more on this you can see the litany of excuses he mentions in his talk from Open Government Data Camp in 2011) and he has been very active in the international open government data community over the past several years.

Welcome aboard Andrew!


Exploring the 2012 Open Budget Survey

January 23, 2013 in Access to Information, OKF, Open Government Data, Visualization

How transparent and accountable are different countries’ national budgets? Every two years, the International Budget Partnership (IBP) runs the Open Budget Survey to try to answer this question, by measuring the budgets of over 100 countries against a wide range of openness standards. The results for 2012 are released today, with an interactive data explorer developed for the IBP by the Open Knowledge Foundation.

A recent post by Albert van Zyl on the IBP’s Open Budgets blog spells out the consequences of a lack of transparency: money vanishing into thin air, the projects it was destined for never happening, and communities being kept in poverty. As the post says, “There are sufficient public resources available globally to make substantial progress on eradicating extreme poverty and creating sustained economic development, but only if these funds are spent effectively and equitably”. For that to happen, van Zyl argues, budgets must be transparent, participatory, and accountable. The survey results show to what extent different countries achieve this or fall short of it.

The explorer gives users a number of ways to visualise the data, not only from the latest survey but from its three predecessors, starting in 2006. A map view shows the changing geography of openness over the four surveys, while a timeline (shown below) shows the movements of individual countries over the same period. A more detailed page of rankings shows graphically how each country’s score is calculated from ninety-five tests of openness, each with four levels from most to least open. A datasheet for each country presents the full data, letting the user see how it has performed on each test in every survey. Users can also generate custom reports, or download the entire dataset.

[IMG: Open Budget Survey timeline]

Another useful feature allows users to see how a country’s score might change for the next survey in 2014. Starting with the 2012 setup, decide what changes to make to your chosen country’s budget systems, and the change that would result to its openness score is shown.

The IBP is a project of the Center on Budget and Policy Priorities, a Washington-based think tank which has carried out highly-regarded work for over 30 years on alleviating poverty through national fiscal policy, both at home in the US and internationally. The Open Budget Survey has established itself as an important and independent tool, and the OKF is delighted to be involved in helping present the results. We hope they will be useful to policymakers, campaigners, journalists, and citizens in helping to push for more open and transparent budgets all over the world.

ePSI Open Data Days, Warsaw, February 21-23

January 22, 2013 in Events, Open Government Data, WG EU Open Data, WG Open Government Data

The ePSI platform team have announced “three days of open data fun” in Warsaw next month. The big day is the 2013 ePSI platform conference on 22nd February, but you’re also all invited to a workshop on the 21st, and a hackday on the 23rd!

At a glance

  • What?: ePSI conference, workshop and hackday
  • When?: 21st-23nd February
  • Where?: Warsaw University, Warsaw, Poland
  • Programme: in development here
  • Register: here for the workshop and here for the main conference. And it’s Free (but places are limited)!

The conference will focus on the theme “Gotcha! – getting everyone on board”. PSI re-use is in the process of reaching a certain degree of maturity and uptake. However, this uptake differs significantly between Member States, PSI domains and stakeholders. The ePSIplatform Conference will therefore be aimed at those that should embark, but have (partly) failed to do so far.

Meanwhile in the workshop we’ll be looking at the value of open data to the public sector itself. The workshop is especially aimed at those who work in the public sector.

And on the 23rd, the hackday will coincide with International Open Data Day, so you’re invited to join the Warsaw open data community for a day of building apps, cleaning up data, or building better connections to data holders. This will take place at Centrum Cyfrowe. Find out more on the Open Data Day in Warsaw here.

Get all the info on the Conference Page or download the Conference Infopack here.

We look forward to seeing you there!

“Carbon dioxide data is not on the world’s dashboard” says Hans Rosling

January 21, 2013 in Featured, Interviews, OKFest, Open Data, Open Government Data, Open/Closed, WG Sustainability, Working Groups

Professor Hans Rosling, co-founder and chairman of the Gapminder Foundation and Advisory Board Member at the Open Knowledge Foundation, received a standing ovation for his keynote at OKFestival in Helsinki in September in which he urged open data advocates to demand CO2 data from governments around the world.

Following on from this, the Open Knowledge Foundation’s Jonathan Gray interviewed Professor Rosling about CO2 data and his ideas about how better data-driven advocacy and reportage might help to mobilise citizens and pressure governments to act to avert catastrophic changes in the world’s climate.

Hello Professor Rosling!

Hi.

Thank you for taking the time to talk to us. Is it okay if we jump straight into it?

Yes! I’m just going to get myself a banana and some ginger cake.

Good idea.

Just so you know: if I sound strange, it’s because I’ve got this ginger cake.

A very sensible idea. So in your talk in Helsinki you said you’d like to see more CO2 data opened up. Can you say a bit more about this?

In order to get access to public statistics, first the microdata must be collected, then it must be compiled into useful indicators, and then these indicators must be published. The amount of coal one factory burnt during one year is microdata. The emission of carbon dioxide per year per person in one country is an indicator. Microdata and indicators are very very different numbers. CO2 emissions data is often compiled with great delays. The collection is based on already existing microdata from several sources, which civil servants compile and convert into carbon dioxide emissions.

Let’s compare this with calculating GDP per capita, which also requires an amazing amount of collection of microdata, which has to be compiled and converted and so on. That is done every quarter for each country. And it is swiftly published. It guides economic policy. It is like a speedometer. You know when you drive your car you have to check your speed all the time. The speed is shown on the dashboard.

Carbon dioxide is not on the dashboard at all. It’s like something you get with several years delay, when you are back from the trip. It seems that governments don’t want to get it swiftly. And when they publish it finally, they publish it as total emissions per country. They don’t want to show emission per person, because then the rich countries stand out as worse polluters than China and India. So it is not just an issue about open data. We must push for change in the whole way in which emissions data is handled and compiled.

You also said that you’d like to see more data-driven advocacy and reportage. Can you tell us what kind of thing you are thinking of?

Basically everyone admits that the basic vision of the green movement is correct. Everyone agrees on that. By continuing to exploit natural resources for short term benefits you will cause a lot of harm. You have to understand the long-term impact. Businesses have to be regulated. Everyone agrees.

Now, how much should we regulate? Which risks are worse, climate or nuclear? How should we judge the bad effects of having nuclear electricity? The bad effects of coal production? These are difficult political judgments. I don’t want to interfere with these political judgments, but people should know the orders of magnitude involved, the changes, what is needed to avoid certain consequences. But that data is not even compiled fast enough, and the activists do not protest, because it seems they do not need data?

Let’s take one example. In Sweden we have data from the energy authority. They say: “energy produced from nuclear”. Then they include two outputs. One is the electricity that goes out into the lines and that lights the house that I’m sitting in. The other is the warm waste water that goes back into the sea. That is also energy they say. It is actually like a fraud to pretend that that is energy production. Nobody gets any benefit from it. On the contrary, they are changing the ecology of the sea. But they get away with it as the destination is energy produced.

We need to be able to see the energy supply for human activity from each source and how it changes over time. The people who are now involved in producing solar and wind produce very nice reports on how production increase each year. Many get the impression that we have 10, 20, 30% of our energy from solar and wind. But even with fast growth from almost zero solar and wind it is nothing yet. The news reports mostly neglect to explain the difference in percentage growth of solar and wind energy and their percent of total energy supply.

People who are too much into data and into handling data may not understand how the main misconceptions come about. Most people are so surprised when I show them total energy production in the world on one graph. They can’t yet see solar because it hasn’t reached one pixel yet.

So this isn’t of course just about having more data, but about having more data literate discussion and debate – ultimately about improving public understanding?

It’s like that basic rule in nutrition: Food that is not eaten has no nutritional value. Data which is not understood has no value.

It is interesting that you use the term data literacy. Actually I think it is presentation skills we are talking about. Because if you don’t adapt your way of presenting to the way that people understand it, then you won’t get it through. You must prepare the food in a way that makes people want to eat it. The dream that you will train the entire population to about one semester of statistics in university: that’s wrong. Statisticians often think that they will teach the public to understand data the way they do, but instead they should turn data into Donald Duck animations and make the story interesting. Otherwise you will never ever make it. Remember, you are fighting with Britney Spears and tabloid newspapers. My biggest success in life was December 2010 on the YouTube entertainment category in the United Kingdom. I had most views that month. And I beat Lady Gaga with statistics.

Amazing.

Just the fact that the guy in the BBC in charge of uploading the trailer put me under ‘entertainment’ was a success. No-one thought of putting a trailer for a statistics documentary under entertainment.

That’s what we do at Gapminder. We try to present data in a way that makes people want to consume it. It’s a bit like being a chef in a restaurant. I don’t grow the crop. The statisticians are like the farmers that produce the food. Open data provide free access to potatoes, tomatoes and eggs and whatever it is. We are preparing it and making a delicious food. If you really want people to read it, you have to make data as easy to consume as fish and chips. Do not expect people to become statistically literate! Turn data into understandable animations.

My impression is that some of the best applications of open data that we find are when we get access to data in a specific area, which is highly organized. One of my favorite applications in Sweden is a train timetable app. I can check all the communter train departures from Stockholm to Uppsala, including the last change of platform and whether there is a delay. I can choose how to transfer quickly from the underground to the train to get home fastest. The government owns the rails and every train reports their arrival and departure continuously. This data is publicly available as open data. Then a designer made an app and made the data very easy for me to understand and use.

But to create an app which shows the determinants of unemployment in the different counties of Sweden? No-one can do that because that is a great analytical research task. You have to take data from very many different sources and make predictions. I saw a presentation about this yesterday at the Institute for Future Studies. The PowerPoint graphics were ugly, but the analysis was beautiful. In this case the researchers need a designer to make their findings understandable to the broad public, and together they could build an app that would predict unemployment month by month.

The CDIAC publish CO2 data for the atmosphere and the ocean, and they publish national and global emissions data. The UNFCCC publish national greenhouse gas inventories. What are the key datasets that you’d like to get hold of that are currently hard to get, and who currently holds these?

I have no coherent CO2 dataset for the world beyond 2008 at the present. I want to have this data until last year, at least. I would also welcome half year data but I understand this can be difficult because carbon dioxide emission vary for transport, heating or cooling of houses over the seasons of the year. So just give me the past year’s data in March. And in April/May for all countries in the world. Then we can hold government accountable for what happens year by year.

Let me tell you a bit about what happens in Sweden. The National Natural Protection Agency gets the data from the Energy Department and from other public sources. Then they give these datasets to consultants at the University of Agriculture and the Meteorological Authority. Then the consultants work on these datasets for half a year. They compile them, the administrators look through them and they publish them in mid-December, when Swedes start to get obsessed about Christmas. So that means that there was a delay of eleven and a half months.

So I started to criticize that. My cutting line was when I was with the Minister of Environment and she was going to Durban. And I said “But you are going to Durban with eleven and a half month constipation. What if all of this shit comes out on stage? That would be embarrassing wouldn’t it?”. Because I knew that she had in 2010 an increase in carbon dioxide emission and it increased by 10%. But she only published that coming back from Durban. So that became a political issue on TV. And then the government promised to make it earlier. So 2012 we got CO2 data by mid-October, and 2013 we’re going to get it in April.

Fantastic.

But actually ridiculing is the only way that worked. That’s how we liberated the World Bank’s data. I ridiculed the President of the World Bank at an international meeting. People were laughing. That became too much.

The governments in the rich countries don’t want the world to see emissions per capita. They want to publish emissions per country. This is very convenient for Germany, UK, not to mention Denmark and Norway. Then they can say the big emission countries are China and India. It is so stupid to look at total emissions per country. This allows small countries to emit as much as they want because they are just not big enough to matter. Norway hasn’t reduced their emissions for the last forty years. Instead they spend their aid money to help Brazil to replant rainforest. At the same time Brazil lends 200 times more money to the United States of America to help them consume more and emit more carbon dioxide into the atmosphere. Just to put these numbers up makes a very strong case. But I need to have timely carbon dioxide emission data. But not even climate activists ask for this. Perhaps it is because they are not really governing countries. The right wing politicians need data on economic growth, the left wing need data on unemployment but the greens don’t yet seem to need data in the same way.

As well as issues getting hold of data at a national level, are there international agencies that hold data that you can’t get hold?

It is like a reflection. If you can’t get data from the countries for eleven and a half months, why the heck should the UN or the World Bank compile it faster? Think of your household. There are things you do daily, that you need swiftly. Breakfast for your kids. Then, you know, repainting the house. I didn’t do it last year, so why should I do it this year? It just becomes slow the whole system. If politicians are not in a hurry to get data for their own country, they are not in a hurry to compare their data to other countries. They just do not want this data to be seen during their election period.

So really what you’re saying that you’d recommend is stronger political pressure through ridicule on different national agencies?

Yes. Or sit outside and protest. Do a Greenpeace action on them.

Can you think of datasets about carbon dioxide emissions which aren’t currently being collected, but which you think should be collected?

Yes. In a very cunning way China, South Africa and Russia like to be placed in the developing world and they don’t publish CO2 data very rapidly because they know it will be turned against them in international negotiations. They are not in a hurry. The Kyoto Protocol at least made it compulsory for the richest countries to report their data because they had committed to decrease. But every country should do this. All should be able to know how much coal each country consumed, how much oil they consumed, etc and from that data have a calculation made on how much CO2 each country emitted last year.

It is strange that the best country to do this – and it is painful for a Swede to accept this – is the United States. CDIAC. Federal Agencies in US are very good on data and they take on the whole world. CDIAC make estimates for the rest of the world. Another US agency I really like is the National Snow and Ice Data Centre in Denver, Colorado. Thay give us 24 hours updates on the polar sea ice area. That’s really useful. They are also highly professional. In the US the data producers are far away from political manipulation. When you see the use of fossil fuels in the world there is only one distinct dip. That dip could be attributed to the best environmental politician ever. The dip in CO2 emissions took place in 2008. George W. Bush, Greenspan and the Lehman Brothers decreased CO2 emissions by inducing a financial crisis. It was the most significant reduction on the use of fossil fuels in modern history.

I say this to put things into proportion. So far it is only financial downturns that have had an effect on the emission of greenhouse gases. The whole of environmental policy hasn’t yet had any such dramatic effect. I checked this with Al Gore personally. I asked him “Can I make this joke? That Bush was better for the climate than you were?”. “Do that!”, he said, “You’re correct.” Once we show this data people can see that the economic downturn so far was the most forceful effect on CO2 emission.

If you could have all of the CO2 and climate data in the world, what would you do with it?

We’re going to make teaching materials for high schools and colleges. We will cover the main aspects of global change so that we produce a coherent data-driven worldview, which starts with population, and then covers money, energy, living standards, food, education, health, security, and a few other major aspects of human life. And for each dimension we will pick a few indicators. Instead of doing Gapminder World with the bubbles that can display hundreds of indicators we plan a few small apps where you get a selected few indicators but can drill down. Start with world, world regions, countries, subnational level, sometimes you split male and female, sometimes counties, sometimes you split income groups. And we’re trying to make this in a coherent graphic and color scheme, so that we really can convey an upgraded world view.

Very very simple and beautiful but with very few jokes. Just straightforward understanding. And for climate impact we will relate to the economy. To relate to the number of people at different economic levels, how much energy they use and then drill down into the type of energy they use and how that energy source mix affects the carbon dioxide emissions. And make trends forward. We will rely on the official and most credible trend forecast for population, one, two or more for energy and economic trends etc. But we will not go into what needs to be done. Or how should it be achieved. We will stay away from politics. We will stay away from all data which is under debate. Just use data with good consensus, so that we create a basic worldview. Users can then benefit from an upgraded world view when thinking and debating about the future. That’s our idea. If we provide the very basic worldview, others will create more precise data in each area, and break it down into details.

A group of people inspired by your talk in Helsinki are currently starting a working group dedicated to opening up and reusing CO2 data. What advice would you give them and what would you suggest that they focus on?

Put me in contact with them! We can just go for one indicator: carbon dioxide emission per person per year. Swift reporting. Just that.

Thank you very much Professor Rosling.

Thank you.


If you want help to liberate, analyse or communicate carbon emissions data in your country, you can join the OKFN’s Open Sustainability Working Group.


Goodbye Aaron Swartz – and Long Live Your Legacy

January 14, 2013 in Access to Information, Bibliographic, Campaigning, Featured, News, Open Access, Open Data, Open Government Data, Policy

Aaron Swartz, coder, writer, archivist and activist, took his own life in New York on Friday.

Aaron worked tirelessly to open up and maximise the societal impact of information in three areas which are central to our work at the Foundation: public domain cultural works, public sector information, and open access to publicly funded research.

He was one of the original architects behind the Internet Archive’s Open Library project, which aims to create ‘one web page for every book’. While he was there we compared notes about trying to automatically estimate which works are in the public domain in different countries around the world.

This was part of a broader vision to enable public access to the public domain, and to ensure that digitisation initiatives result in open digital copies of public domain works that everyone is free to use and enjoy, not just copies owned and protected by large corporations who might sell or restrict access to the world’s heritage.

Around this time Aaron and I met in San Francisco to co-draft a petition to the Library of Congress to encourage them to take a leading role in opening up data from the world’s libraries and memory institutions. This was several years before a wave of institutions started explicitly opening up data about their holdings.

We remained in contact regarding his work on open government data in the US. Aaron was involved in drafting the highly influential 8 principles for open government data. We wanted to try to better coordinate developments on either side of the Atlantic.

Later he was in the papers for downloading around a fifth of the US government’s huge Public Access to Court Records (PACER) system, around 780 gigabytes, and releasing it for free to the public (access was usually charged by the page) – which earned him an FBI file.

In his 2008 Guerilla Open Access Manifesto Aaron argued that “the world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations” and, “in the grand tradition of civil disobedience”, urged internet users to “fight back”:

We need to take information, wherever it is stored, make our copies and share them with the world. We need to take stuff that’s out of copyright and add it to the archive. We need to buy secret databases and put them on the Web. We need to download scientific journals and upload them to file sharing networks. We need to fight for Guerilla Open Access.

In 2010 he founded Demand Progress, which helped to mobilise over a million people in response to proposed legislation like the Combating Online Infringement and Counterfeits Act (COICA).

In 2011 he again hit the headlines when he was arrested for downloading roughly 4 million subscription-only academic articles from JSTOR by placing a laptop in a computer cupboard at MIT and using this to gain unauthorised access to the JSTOR service. The prosecution alleged that he intended to make these articles freely available on the web.

Last September the US Federal Government raised the felony count from four to thirteen, which meant that Aaron was potentially facing a total of 50+ years and a fine in the area of $4 million for his actions. His family suggested that the case was a factor in his death – and blamed the Massachusetts U.S. Attorney’s office for “intimidation and prosecutorial overreach” and MIT for “refus[ing] to stand up for Aaron and its own community’s most cherished principles”. The president of MIT has just announced that he has ordered an investigation into their role in Aaron’s prosecution.

As Peter Eckersley from the Electronic Frontier Foundation commented on Saturday:

While his methods were provocative, the goal that Aaron died fighting for — freeing the publicly-funded scientific literature from a publishing system that makes it inaccessible to most of those who paid for it — is one that we should all support.

While Aaron was deeply involved in all kinds of technical, scholarly and organising activities to promote an open digital commons and an open internet – from helping to develop RSS 1.0 and Markdown, to early sketches of the semantic web with some of its pioneers and work on the first technical implementations of the Creative Commons licenses – he also never lost sight of the bigger picture, of what it was all for. He was a talented coder and knew how to take a principled stance, but he was never one to get lost in detail or dogma. From his writings about how data-driven transparency initiatives are not enough to effect change in themselves, to his guide to developing software that addresses real needs, he was always aware of the fact that using the information, technology and the internet to change the world is not easy, and requires graft, skill, scrutiny, critical reflection and taking risks.

Aaron’s passing is a tremendously sad and significant loss. Long live his legacy.


To find out more about Aaron’s life and works, you can look at his writings and the memorial site set up by his family. You can also read tributes from Tim Berners-Lee, Cory Doctorow, Brewster Kahle, Lawrence Lessig, and Erik Moeller, and read obituaries and news articles on the BBC, the Economist, Forbes, Gigaom, the Guardian, the Huffington Post, the New York Times, The New Yorker, The Observer, Techdirt, The Telegraph, Vice and Wired. In tribute, hundreds of academics have started tweeting links to their research papers using the hashtag #pdftribute. The Internet Archive has started an Aaron Swartz Collection.


Show me the (quality) data!

December 4, 2012 in Open Data, Open Government Data

Show me your data!
Put it online!
Make it re-useable and accessible!

That’s the rallying cry of many in the Open Data movement. Few, at this point, seem to be demanding: make sure your data is credible, robust and of high quality! Why is this important? It is true that there is value in making a range of data sets available to stimulate interest in data use. At the same time, there is a real risk that the Open Data momentum could be derailed if out- of-date or inaccurate data sets made available by governments are used for economic forecasting, developmental planning or attempts to hold a government to account. Imagine a CSO trying to measure a country’s progress toward a developmental goal based on 10-year-old poverty data. Think it’s an unrealistic scenario? Think again.

In Kenya, the most recent household poverty data available was compiled in 2005-06. This data has now been released though the Open Data portal. How useful is it? How can NGOs use it to argue for effecting changes? To measure government development goal delivery? How can the Government develop economic policies or make resource allocations based on this data?

Most data is, or should be, drawn from records, and if the records aren’t reliable, the data won’t be reliable. Records integrity is based on proper management of the information from the time it is created until it ceases to have value.

Where reliable records cannot be accessed, openness is unachievable. When record keeping is poor, ordinary citizens are the losers. Poorly managed records tend to be incomplete, difficult to locate, and hard to authenticate; they can be easily manipulated, deleted, fragmented or lost. They undermine Open Government initiatives and result in inaccurate or incomplete data and information, which in turn can lead to the misunderstanding and misuse of information, cover-up of fraud, skewed findings and statistics, misguided policy and misplaced funding, all with serious consequences for citizens’ lives. Poor quality records can impair delivery of justice, human rights cannot be protected, government services are compromised, and civil society cannot hold governments to account.

Paper-based records, which are still used extensively, are not well managed in many cases; while the rapid introduction of ICT systems across governments has not addressed the challenges of protecting the integrity of the digital information that these systems generate.

Take the example of paper record keeping in the Burundi Supreme Court, where records over seven years old were found to be in abysmal condition. Poorly stored in a basement, where they were subjected to rain and dust; shelves had collapsed, and the records were in heaped in indiscriminate piles. A stray dog even managed to make its way into the basement and ripped up some records to have a litter of puppies on. Imagine trying to generate judiciary statistics covering a 10-year period to measure number of rulings? Or the fairness of the trials? Try to generate the necessary data using these records to determine how accountable a court is, how transparent court rulings or processes are.

And digital record-keeping is not immune either: work in Sierra Leone on civil servants’ and teachers’ records demonstrates how misleading employment data can be if the records used to generate it are badly kept. Once accurate records had been created and provided as the basis for verifying actual teachers against the payrolls, it was determined that ‘ghost workers’ – people claiming the pay for dead or non existent people beyond retirement age – accounted for approximately 14% of the civil service payroll and approximately 25% of the teachers’ payroll. The discovery will save the government millions of dollars annually and enabling accurate human resource planning. Openness, transparency and accountability in relation to employment data would not have been meaningful before the records controls were introduced.

Ultimately Open Data will need to be credible.

It is important to move beyond the idea that simply publishing data sets will foster momentum and interest in the use of data for accountability or economic growth. More thought must be given to the integrity of the records that provide the basis for the data, and the means of tracing the data back to the source evidence. Governments must be held to account for what they publish, and we, too, must be accountable for the information that we encourage them to provide.

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates