Support Us

You are browsing the archive for OpenSpending.

Aid Data: From XML to Visualisations – IATI data in OpenSpending

June 5, 2012 in Open Government Data, Open Spending, Open Standards, WG Development

Are the World Bank and Department for International Development (DfID) spending money on projects in similar sectors and countries? Does all aid to Kenya go the North-East? How much aid in total did India receive last year?

Until recently, it was impossible to know. But now, thanks to the International Aid Transparency Initiative (IATI), we’ve been able to start to answer these questions – making the aid process more transparent, which is crucial for making it more effective.

IATI is a political agreement by the world’s major donors – including international banks, private foundations and NGOs – on a common way to publish aid information. It also defines a technical standard for exactly how that information should be published, IATI-XML.

So far, 29 donors representing 74% of Official Development Finance (ODF) have committed to publishing to IATI. A further 13 donors representing 45% of ODF have already published, and 12 NGOs and foundations have published their own data.

This post details how we converted each donor’s data, using simple scripts and open source tools, from raw XML data in the IATI Registry into a consolidated dataset and then, via loading into OpenSpending to visualisations like those shown above and an easy-to-use RESTful API.

From this....

... to this.

Getting the Data Together

Full details of how we got the data together are in this case study on OpenSpending … but to summarize:

  • We grabbed a list of all the IATI data files via the IATI Registry API (the IATI registry is running CKAN so this is very easy)
  • We converted the data to an SQLite database and a simplified CSV format and posted these on the IATI dataset on the DataHub
  • Modelled and loaded it into OpenSpending, creating views to visualize it in basic forms.

What you can see

You can now explore the complete dataset of aid data released so far through IATI, exploring the aggregate and detailed data on OpenSpending. You can drill down through the data and look at it from different perspectives, from exploring the largest sectors in a country, to different implementing organisations in that sector, to looking at all the projects implemented by a single organisation.

Drill down from one layer…

IATI 1

… to the next – we’re zooming in on China here, breaking down by flow type…

IATI China Zoom

… and you can switch between breakdowns – slicing data here up by organisations implementing the aid…

IATI China Implementing Organisation

… and here by funding organisation

IATI China Funding Organisation

More details

We’ve also just put together a briefing on how we worked with the IATI data on OpenSpending.org. The briefing covers in depth what IATI is, using the IATI registry, consolidating data into a simple format, loading data into OpenSpending and using the API.

Next steps & get involved.

For those keen to put coding knowledge to good use to further the IATI mission, some ideas below:

  • Use the API – you can use OpenSpending’s API to build applications – read the briefing for more ideas and instructions
  • Review our scripts for converting IATI data. We’ve been compiling a list of known issues with possible future extensions such as geo-coding, reconciling organisations and handling currencies.

What’s in the data, what’s still to come

The dataset contains current and future spending by major aid donors representing 44% of ODF, with disbursement data running up to the current month in some cases. It also contains commitment data up to 2016 from one donor (and from multiple donors up to 2014).

However, the data does not contain any information from donors who have not yet published to IATI, and it also does not yet include results, project documents or geo-coded data.

Future projects might include:

  • Validation – to ensure that data is properly formatted and uses standard codelists;
  • Adding results, geo-coding and project documents to the OpenSpending visualisation – some of this is already available in the original source data, but has not yet been incorporated to this dataset;
  • Other visualisations – for example, a map, and activity and transaction views;
  • Running the dataset compilation automatically – so that it runs on a server nightly, is up-to-date and imports the latest version to OpenSpending as it’s updated.

The future

Eventually what we’d like to see is something like this: an integrated dataset of aid and budgets in each country, so that the full picture of resource flows is available.

PWYF Uganda

Which country will be next to join up their aid and budgetary flows? You can get in touch with us via the mailing list if you have any questions about this project or the data.

This post was written by Mark Brough. It is cross-posted on the OpenSpending blog.

Data = Seized, Sanitised and Sanity-checked. Open Data Day 2011

December 12, 2011 in CKAN, Open Spending

This post is by Mark Brough, Research Officer at Publish What You Fund, Lucy Chambers, Community Coordinator for OpenSpending, and Irina Bolychevsky, Product Owner for CKAN. It is cross-posted on the OpenSpending Blog and the CKAN blog and Mark Brough’s contribution is also featured on aidinfolabs.org.

Saturday, December 3rd was Open Data Day, and London took the challenge to throw a hackday to help data be opened, cleaned and shown off to the world…

Fuelled only by enthusiasm, caffeine and 5 packets of ready-made popcorn, the CKAN, OpenSpending and IATI teams, along with some new faces, joined forces to liberate as much data as they could…

OpenSpending + IATI + CKAN

As part of the IATI Open Data Day challenges, Mark Brough did some work to get the existing IATI Data into OpenSpending. David Read, from the CKAN team, and a new face to the data wrangling crew, Johannes, scraped data on aid donations from France and Austria that were locked-up in web apps in order to help fill in the gaps in the global aid data jigsaw puzzle.

These, along with many other datasets discovered on the day via tweets and emails have been added to the Open Data Day Group on theDataHub.org.

You can see the results of the IATI wrangling process on OpenSpending.org/iati. This following section is written by Mark.

1. Getting the data

Downloading the existing IATI data has already become quite a big task; with 19 publishers so far, the data currently amounts to over 750MB with 1169 packages. Fortunately this is made easier by the IATI Registry, which provides an API to access all existing datasets, and a simple script (links at end) can retrieve all of the data.

2. Extracting the data

Extracting the data from the XML files is more complicated. Although IATI data uses a standard schema, there are a few cases where publishers have either used the markup incorrectly, or else interpreted the definitions slightly differently. This can be simple problems such as stating that an organisation is “implementing” rather than “Implementing”, or placing the date within the text of the tag and not the “iso-date” attribute of that tag, or more significant problems such as placing implementing organisations in the “accountable” organisation field.

However, these problems are still fairly limited and follow fairly regular patterns, so they are not too hard to overcome. There are more significant problems when some donors have for example used three-letter (ISO-3) country codes, rather than two-letter (ISO-2) country codes. (This is considered below in “next steps”.)

3. Wrangling the data

OpenSpending is designed to show spending data, and has a powerful aggregation system to show large collections of transactions in a meaningful way. However, IATI data is organised by activities, with transactions nested within activities (projects), and – reflecting the business models of funders – activities sit within other activities (e.g., projects within programs), although they are not nested in the actual XML. Furthermore, one of the significant advantages of IATI compared to other aid data formats is that it permits multiple sectoral classifications, allowing you to assign a proportion of the value of an activity to each sector. So, you might have an activity that is 50% related to health and 50% to education.

To prepare the data for OpenSpending, each transaction inherits the properties of its activity (and, if that activity has a parent, that parent activity’s title and description). Then, the transaction is broken out into mini transactions, with the proportion of the activity assigned to each sector used to assign a proportion of the value of the transaction to each sector. So, from transactions, you get mini “sector-transactions”.

This takes about 40 minutes to compile, and then one final step remains: to convert the currencies to a single currency. Currently, USD, EUR and GBP amounts are used in the IATI data. All data is converted to USD using the average for 2010 from the OECD’s Financial Indicators (MEI) dataset. (This is also considered below in “next steps”.)

4. Loading the data

OpenSpending’s new web-based loading interface makes it relatively easy to load data in, although you currently also have to write a model and views (links at end).

Results

The results can be viewed in the OpenSpending IATI dataset. You can explore the data by recipient country, sectors, funding organisation, and drill down through the data to see the data for an individual country.

Problems with the data

So far I’ve noticed the following problems:

  • “Unknown” recipient location is incorrectly marked as “South Sudan”
  • Recipient countries are listed twice, as Spain has used ISO3 rather than ISO2 country codes.
  • Sweden is listed as “Ministry of Foreign Affairs” (this is how they have listed themselves as the Funding Organisation in the data)
  • Sweden’s implementing organisations have been lost as they placed them in the accountable organisation field.

Please let me know if you see anything else problematic, if you have and criticisms of feedback of the way the data has been presented, or if you think there are other ways you’d like to be able to explore the data, based on the available dimensions.

Next steps

As mentioned above, there are some problems with the data which should properly be dealt with at the level of the donor agency. But there are others that will probably have to be dealt with by users of the data:

  • Mapping between different sector vocabularies, so that you can see all “Health” projects, and not only the health projects according to a single vocabulary
  • Mapping between countries and regions, so that every project in a country has a related region
  • Correctly converting currencies using the “value-date” column to get a more precise (at least month-specific) conversion.

What else have you noticed with the data? Is there anything else that should be changed? Anything interesting?

You can contact Mark about this data via the OpenSpending mailing list

Useful Links

Opening up Government: Data.gov.uk publishes UK all central government spending data over 25k.

June 16, 2011 in Open Spending

This post is by Friedrich Lindenberg, one of the core developers on the OpenSpending project. He describes some of the hurdles that had to be overcome to get to today’s online release of all UK central departmental spending data over £ 25k and some interesting questions stemming from the data.

In November of last year, the UK government announced plans to publish central government spending data for all items with a value of more than £25,000. Seven months on, an impressive amount of this data has been released to the public: data.gov.uk lists 557 distinct datasets from every government entity – from the NHS to the MOD.

Despite this leap forward, it is still hard to get a general overview of the 3327 spreadsheets that have been made available: Questions remain unanswered: How much did a particular supplier get paid across government departments? Which are the biggest suppliers for all NHS outposts? Which companies are working to put on the London 2012 Olympic games and how much is each of each of them consuming? Interesting names and figures jump out: Who are the ‘Shadow Robot Company Ltd’ and what exactly are they doing with £25,586 of the UK’s money?

To help finding answers to this question we set out to collect, clean up and present all central government spending data in OpenSpending.

Processing the Data

Once the data had been published, there was a lot of work to be done to make it useable in Open Spending.

Having located all available spending releases in the data.gov.uk index, the first step was creating a local cache of all the data and converting it to a common format.

Even though government guidelines ask for the data to be published as CSV with a particular set of column headers, we had to correct both file format and column name for most of the available data. In some cases, even the content of the fields e.g. inverted dates (Month/Day vs. Day/Month) had to be corrected manually. Other departments had left out vital information such as the supplier VAT code or the government entity responsible for the spending.

We also had to normalize many of the entities involved both companies and government departments. For companies we had the benefit of the excellent reconciliation service offered by OpenCorporates.com, but unfortunately, for government departments and other entities no such service is available yet. As a workaround, a simple Google Document allowed us to map some of the used abbreviations and most blatant misspellings to their correct forms.

After performing all these operations on a temporary SQLite database, we were able to generate a consolidated 450MB CSV file for all of the 25k spending with over 1.8m identifiable records as well as a list of error reports both for invalid files and individual records. These results are available on the UK Government 25k spending data package on CKAN and could now be easily loaded into http://OpenSpending.org and thence presented through an embeddable JavaScript in a convenient interface on data.gov.uk/openspending.

The decision of the UK Government to publish this data represents a huge step towards more participatory governance, greater transparency and accountability in financial governance.

Thanks to OpenSpending, government spending in the UK is searchable, categorizable and, most importantly, analysable by anyone interested in public spending. OpenSpending will continue to develop tools to allow ever more insightful analysis of the data and hopefully, many more governments will follow suit in opening up their public expenditure records.

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates