This post is by Friedrich Lindenberg, one of the core developers on the OpenSpending project. He describes some of the hurdles that had to be overcome to get to today’s online release of all UK central departmental spending data over £ 25k and some interesting questions stemming from the data.

In November of last year, the UK government announced plans to publish central government spending data for all items with a value of more than £25,000. Seven months on, an impressive amount of this data has been released to the public: data.gov.uk lists 557 distinct datasets from every government entity – from the NHS to the MOD.

Despite this leap forward, it is still hard to get a general overview of the 3327 spreadsheets that have been made available: Questions remain unanswered: How much did a particular supplier get paid across government departments? Which are the biggest suppliers for all NHS outposts? Which companies are working to put on the London 2012 Olympic games and how much is each of each of them consuming? Interesting names and figures jump out: Who are the ‘Shadow Robot Company Ltd’ and what exactly are they doing with £25,586 of the UK’s money?

To help finding answers to this question we set out to collect, clean up and present all central government spending data in OpenSpending.

Processing the Data

Once the data had been published, there was a lot of work to be done to make it useable in Open Spending.

Having located all available spending releases in the data.gov.uk index, the first step was creating a local cache of all the data and converting it to a common format.

Even though government guidelines ask for the data to be published as CSV with a particular set of column headers, we had to correct both file format and column name for most of the available data. In some cases, even the content of the fields e.g. inverted dates (Month/Day vs. Day/Month) had to be corrected manually. Other departments had left out vital information such as the supplier VAT code or the government entity responsible for the spending.

We also had to normalize many of the entities involved both companies and government departments. For companies we had the benefit of the excellent reconciliation service offered by OpenCorporates.com, but unfortunately, for government departments and other entities no such service is available yet. As a workaround, a simple Google Document allowed us to map some of the used abbreviations and most blatant misspellings to their correct forms.

After performing all these operations on a temporary SQLite database, we were able to generate a consolidated 450MB CSV file for all of the 25k spending with over 1.8m identifiable records as well as a list of error reports both for invalid files and individual records. These results are available on the UK Government 25k spending data package on CKAN and could now be easily loaded into http://OpenSpending.org and thence presented through an embeddable JavaScript in a convenient interface on data.gov.uk/openspending.

The decision of the UK Government to publish this data represents a huge step towards more participatory governance, greater transparency and accountability in financial governance.

Thanks to OpenSpending, government spending in the UK is searchable, categorizable and, most importantly, analysable by anyone interested in public spending. OpenSpending will continue to develop tools to allow ever more insightful analysis of the data and hopefully, many more governments will follow suit in opening up their public expenditure records.

Website | + posts

Lucy is a free range "tech-translator", blogging about her work at http://techtohuman.com/.

Formerly, Lucy worked for Open Knowledge leading School of Data, co-editing the Data Journalism Handbook and coordinating the OpenSpending community.