Support Us

You are browsing the archive for CKAN.

The Open Data Cities Conference

April 25, 2012 in CKAN, Events, Open Data

Brighton was buzzing with wise, whacky and innovative ideas for Open Data on Friday – even more than usual – as about 150 people converged on the city for the first Open Data Cities Conference. So passionate was the organiser, Greg Hadfield, about the potential of Open Data in cities that he gave up his job at the start of the year to work full time on the ODCC, and the fruits of his labours were apparent in the seemingly endless roster of first-rate speakers.

The conference was supported by the Open Knowledge Foundation, and Laura James, a Foundation Co-ordinator, spoke in the afternoon about the emerging need for Data Management Systems. She made the case that Open Source DMS’s like the OKF’s CKAN are a good fit for open data: by freeing data publishers from reliance on one provider, they ensure sustainable open data in the long term.

[IMG:Laura James at ODCC]
Dr Laura James speaking at the conference

With such a hectic succession of speakers there was a lot to digest, but looking back, some themes emerge from the day. One thread that emerged repeatedly was that, for both data and cities, collaboration is key. Charlie Stewart and John Shewell of Brighton and Hove City Council first struck up this theme, saying the aim of Brighton’s Open Data policy is to enable more active involvement from citizens. Lean Doody of Arup praised cities as spaces that increase the potential for collaboration and hence for innovation and productivity. Leigh Dodds, CTO of the lead sponsor Kasabi, in a historical tour-de-force covering Robert Hooke’s part in the piecemeal surveying and rebuilding London after the Great Fire of 1666, argued that the present-day tidal wave of data is a modern counterpart of the Great Fire. Now as then, grand top-down solutions are doomed to fail, and data hubs must become collaborative enterprises where anyone can bring as well as use data – like the CKAN-powered DataHub or indeed Kasabi‘s own offering.

Information is power, and another theme that emerged was that of self-determination – of giving people control. Drew Hemment founded FutureEverything, the techo-art festival in Manchester that was part of the drive behind the brilliant (and CKAN-powered) datagm.org.uk. Drew told us he’s suspicious of arguments for Open Data based on transparency. (It’s interesting that transparency was not noticeably a theme of the day.) His interest in Open Data is in giving people control. John and Charlie had hit exactly the same note earlier, noting that that millions of people want more control over decisions that affect them, and asking, ‘How do we unlock data so citizens can influence decisions before they’re made?’

The process of opening data also came under scrutiny, with speakers emphasising that it is never a once-for-all affair. The advice from Tom Steinberg, founder of MySociety, is to pay close attention to incoming requests for data so that you know what people want – even if those requests are coming to somebody else in the organisation. Emer Coleman of the Government Digital Service – and former Open Data champion at the Greater London Authority – told data publishers to ‘get ugly early’: release early and iterate, improving the data as you go. And Ian Holt of Ordnance Survey spoke about OS’s experience of releasing mapping data: they have an ongoing engagement with users such as the Geovation Challenge, though they would still like to know more about how people use their data. I’d met Ian before, on one of the excellent Open Data Masterclasses that are another part of OS’s ongoing user engagement.

Tom and Emer also both touched on the resistance to getting data out in the open. Emer pointed out that this often comes from officials fearful of the consequences – rather than from any considerations of the public weal – even though politicians, who have more to be fearful of, are often in favour. Tom gave some helpful tips on overcoming resistance to releasing data: officials are more receptive to the argument that open data will reduce their workload, for example, than that it will shine a light in dark corners, which may be just what they fear.

Tom knew whereof he spoke, having found himself unexpectedly at the head of a criminal organisation simply by trying to build useful stuff – which back then meant stealing data that now is freely available, partly thanks to his efforts. The creative power of Open Data was of course another theme that reoccurred throughout the day. As Laura James reminded us, the best use of your data will be made by someone else – when the OKF’s site wheredoesmymoneygo.org was experimental, it was often down, leading to enquiries from HM Treasury who found it the most useful way to look at their own data! Lean Doody gave examples in the area of smart cities, including a bus app in Sydney so wildly successful it had to be withdrawn after a couple of weeks because the data provider couldn’t cope with the web traffic. Bill Thompson gave a glimpse of where the ongoing work on the BBC archives might lead. (Are you visible in a pushchair in a street shot from a soap opera when you were 2? Imagine if you could search and find out.) And Jonathan Carr-West of the Local Government Information Unit, quoting the French theorist Bordieu for extra street-cred, said that data is a major new field of exchange – a role the city has traditionally filled – and that Open Data must permeate our habitus and doxa if we are to find solutions together to such existential human problems as world hunger, climate change, resource shortage, ageing and war.

The last word, fittingly, went to Greg Hadfield, who recalled the early days of the internet – 1995, in fact, when he and his teenage son launched Soccernet, to general derision (‘would why anyone get football results from a computer when they can use Ceefax?’) Four years later it was sold for £40 million – ‘We didn’t get the money’, he lamented in an aside (not that he’s done too badly since). As then with the internet, so now with Open Data, Greg sees whole new possibilities open up – and he is determined that communities and cities, and Brighton in particular, should be in the lead.

For other perspectives on the ODCC, see here, here or the amazing live blogging from the day here.

Introducing the DataStore

March 27, 2012 in CKAN, Our Work, Technical

A major new feature in the DataHub is good news for data wranglers. The DataStore allows users to store and load structured data into a database, where it can be queried, filtered, or accessed from other programs via a rich data API.

The API is also used by CKAN’s inbuilt Recline Data Explorer, giving in-page previews of the data with full text search, filtering, sorting and graphing, as in the screenshots below:

[IMG: Sorting] [IMG: Graph]

These new DataHub capabilities are powered by the recently enhanced DataStore and Data API functionality of our open-source CKAN data management system, which as well as powering the DataHub runs many other data portals including data.gov.uk.

An introduction to the DataStore and Data API



LOD2 plenary, Vienna, 21-3 March 2012

March 23, 2012 in CKAN, Events, Linked Open Data, LOD2, OKF Austria

I am in Vienna, along with my colleague Ira, for a plenary meeting of the assorted partners of the LOD2 project. LOD2 is an EU-funded research project on Linked Open Data, the vision of an interlinked web of data known to many from Tim Berners-Lee’s TED talk. The meeting runs for 3 days, in which there will be discussions about the various work packages, but I have been given the task of blogging about the opening introductory session on Wednesday afternoon. (Full disclosure: I have received a handsome LOD2 mug as advance payment for my efforts.) The Open Knowledge Foundation is one of the partners, because the pan-European CKAN data portal publicdata.eu is part of the project. But being personally a relative newcomer, I was looking forward to finding out in this introductory session what the project is really all about.

[IMG: Delegates at LOD2 plenary]
Delegates at the LOD2 plenary

Sören Auer, the project co-ordinator, kicked off, giving an overview of the overview. He described the lifecycle of Linked Data, from extraction (from other structured or unstructured data) through to linking in to existing data, enrichment (perhaps by adding more structure), to the point where it can be explored for interesting patterns. For each stage in the lifecycle, there are tools being developed by the project – many are already released. Collectively these tools, which are all Open Source, form the LOD2 ‘stack’. Sören also mentioned some recent milestones, including a Serbian CKAN portal holding a lot of data in RDF, the native format for Linked Data; and a planned new data-oriented conference, the European Data Forum.

The tools: Work Packages 2-6

WP2: Optimising the store

Peter Boncz of CWI spoke about Work Package 2. (What happened to WP1, you ask? It was a prototype which finished earlier in the project.) WP2 concerns Virtuoso, the database part of the LOD2 stack. The challenge with RDF is to make a database that runs efficiently with huge quantities of data, as the potential for rich interlinking means the data is not neatly segmented into tables as in a normal database. A lot of progress has already been made, and he hopes that Virtuoso 7 will be released soon. It will be structured to enable better compression (speeding up processing by reducing I/O), and use adaptive caching to try to minimise the number of queries that need to be done more than once.

WP3: Getting the data

Jens Lehman of AKSW at the University of Leipzig was next, talking about WP3 on ‘extraction, enrichment and repair’: the creation of Linked Data from existing structured or unstructured sources, its enrichment with suitable taxonomies to describe it, and detecting inconsistencies or other problems with its structure. If that sounds like a wide-ranging package, it is: as Jens told me later over dinner (not entirely seriously), ‘anything that doesn’t fit in one of the other packages gets stuffed into WP3′! There are currently over 20 tools playing a role in this stage, including Natural Language Processing techniques for extracting data from free text.

WP4: Creating links

Next up was Robert Isele of the Freie Universität Berlin. WP4 aims to enrich RDF data by adding links to other data sources, as well as linking data together by identifying duplicate entities within or between datasets. Automatic tools suggest links that a user can confirm or reject. WP4 also includes work to create an RDF-enabled version of the open source data cleaning tool Google Refine.

WP5: User interfaces

Sean Policarpio of DERI reported on WP5 on browsing, visualisation and authoring interfaces. He demonstrated geospatial data on a map, filtered with a structured (faceted) search – combining the power of Linked Data with a mapping search like Google Maps. Associated with this, they have produced a ‘semantic authoring’ tool, allowing the user to add or edit Linked Data via the map. Their next tasks are to implement ‘social semantic networking’ – for example, notifications based on semantic content – and mobile interfaces for their semantic tools.

WP6: Integrating the tools

Finally, the engaging and very Belgian Bert van Nuffelen of TenForce spoke about WP6, which aims to make the various disparate tools in the LOD2 stack play nicely together. They have worked on making it easier for users to install the stack tools, a shared interface and shared authorisation using WebID. They have also recently released an intermediate version of the stack (version 1.1) with new and upgraded tools and better documentation.

By now it was 3 o’clock and, against all expectations, the meeting was ahead of schedule. So we had a relatively luxurious half-hour break for tea. Your correspondent and another relative newcomer, Jan from Tenforce, took the opportunity to get some fresh air and a feel for the Viennese genius loci. Or should that be Ortsgeist?

The use cases

WP7: Publishing

We had heard about the tools that had been, and are being, developed to manipulate Linked Data. But how will they be used? Refreshed by tea we returned to the meeting to hear about the three Work Packages concerned with use cases. Perhaps the most exciting talk of the afternoon came from Christian Dirschl of WP7 and Wolters Kluwer Germany (WKD). WKD is a legal and accountancy publisher who are already adapting and using the LOD2 stack tools to enhance their publishing business. Christian told us that ‘semantic technologies enable publishing media to create added value’, and WKD’s first release of news and media datasets created using Linked Data tools is on course for publication in April. By December they will release an interlinked version of the datasets, including links to DPpedia and further optimised tools.

WP8: Enterprise

Amar-Djalil Mezaour of Exalead presented the ‘enterprise’ use case WP8, an application to human resources with the aim of matching job vacancies to applicants. Some early work trying to model CVs had met criticism on the ground, among others, that the EU reviewers had doubts about volume of data freely available. WP8 has refocused its attention on job vacancies rather than CVs, for which there is plenty of data and better RDF support. They hope to release the results later this year, with vacancies ‘dashboards’ and analytics, faceted by sector, region, salary, etc, using Linked Data, and enriched with mashups with other sites such as social networks.

WP9: Government data

After a long wait in the wings, it was time for the OKF’s own Ira Bolychevsky to take centre stage at last. WP9 aims to explore the applications to making government data available and maximising its use. Its main visible output is publicdata.eu, which republishes open data from government portals throughout the European Union. publicdata.eu has recently been upgraded and repaired: it now runs the latest version of CKAN, introducing features such as data previews (like this) and – live on the DataHub and coming soon to publicdata.eu – a data API for structured data. Two subjects we hope to discuss more later in the plenary are closer integration with the LOD2 stack, and metadata standards.

[IMG: Ira Bolychevsky at LOD2 plenary]
Ira presenting WP9

Jindřich Mynarz briefly mentioned the new Czech CKAN portal. They have developed a detailed methodology as well as a ‘Quick Start guide’ for publishers, both of which they promise to make available in English soon (hurrah!)

Finally Vojtech Svatek of UEP gave a quick overview of WP9a, which aims to use Linked Data technology in the field of public procurement, with ontologies for public sector contracts – providing matchmaking and analytics not dissimilar from those in WP8.

A jug of wine, a loaf of bread

Perhaps the reader has read enough of Work Packages for now. Anticipating your satiety, the organisers had decided to defer the presentations from WP10-12 until Friday. In their place an outsider to the LOD2 project, Allan Hanbury, gave a lightning talk on a slightly related EU project, Khresmoi, which aims to provide useful searching tools for large medical databases.

Thus concluded the day’s business, and we all dispersed to our various hotels. The OKF contingent, along with TenForce, are staying in one just a couple of roads away. Crossing a road is hazardous in Vienna, because there are sometimes cars parked in what seems to be the middle of the road. You keep half-expecting some lights to change and the cars to zoom off. In fact they are parked between the road and the tramlines, along which long and elderly trams snake through the city.

In the evening, everyone from the day’s meetings reconvened and were whisked away on one such tram to an outlying districts of the city, for an evening at a (more or less) traditional Austrian Heurige, an untranslatable type of wine tavern. A true Heurige, Helmut from the Semantic Web Company explains to me as we hurtle along, is run by a vineyard, and gives people an opportunity to sample its new year’s crop of wine. (‘Heurige’ in Austrian German literally means ‘this year’.) It will have a licence to open for only 2 or 3 weeks a year, and when open will hang out a spray of branches and a lamp to signify the fact.

There is still some wine grown in Vienna, I am told, but most of the Viennese Heurigen are open all year round and are really just restaurants. But they recreate the atmosphere of the real thing. Patrons are served wine and a mixed plate of traditional local foods, which, for readers not familiar with Austrian cuisine, mainly consist of various kinds of sausage, potato and cabbage. They are delicious, and so is the Apfelstrudel that comes along later. The only thing I cannot recommend in Vienna is the tea. When will these foreigners learn that it must be made with boiling hot water?

To follow blogs from the LOD2 plenary, see the blog parade from the project blog.

Living Labs Global Award 2012 – Two Open Knowledge Foundation Projects Nominated

March 8, 2012 in CKAN, Events, News, OKF Projects, Open Economics, Open Spending, WG Economics

Two projects of the Open Knowledge Foundation have been nominated for the Living Labs Global Award 2012: OpenSpending.mobi – Participatory budgeting through augmented reality and CityData – Making Cities Smarter – A central entry point to all your city’s data. Out of nearly 700 submitted showcases, about 15% have been selected to submit an extended version of the showcase. The Winning Showcases will be presented during the Rio Summit on Service Innovation in Rio de Janeiro on 2-3 May, 2012.

The Living Labs Global Award cooperated with cities in Africa, Asia, South and North America and Europe in order to present challenges related to health, mobility, education urban management and sustainable development, affecting more than 125 million people. Winners of the Living Labs Global Award are invited to implement their showcase as a pilot project, providing valuable inputs in product development and public sector procurement.

“Companies, non-governmental organisations and research centres have invested in technologies that change our cities”. The Living Labs Global Award 2012 provides an opportunity to innovators to present their solutions, receive professional and detailed evaluation, and is a distinguished recognition of their efforts in providing sustainable and innovative solutions for cities.

OpenSpending.mobi is nominated in the category Participation in Service Design and Delivery in Sant Cugat del Valles, Spain.

An increasing number of cities invite their citizens to help allocate municipal funds through participatory budgeting. Yet these debates often remain abstract: should more funds be given to schools or hospitals? Should the city pay down debt by selling property or by reducing social benefits?

OpenSpending.mobi aims to make budgeting debates happen where their effects will take place: out in the streets. The project will geo-code local government expenditure, and present funding information as location-based virtual overlays on mobile devices. Both the city government and normal citizens will be able to either propose new projects or rate and comment on those of others.

With a growing set of other Augmented Reality (AR) layers becoming accessible, more and more information will be available to facilitate hyperlocal decision-making. The project could be further expanded to include regular group tours through the city in which digital layers and real-life debate combine into a data-based moving agora.

CityData – Making Cities Smarter is nominated in the category Free Spatial Data for Information & Services in Kristiansand, Norway.

Where do citizens and developers go for information in your city? Perhaps for public transport timetables they have to visit the websites of the local bus and tram companies, for information about bin collections a local council site, for crime data the local police website … and so on.

CityData is a platform that brings geo-coded information from local councils, departments and agencies together in one place. Different agencies can upload links to their data from existing systems either using an intuitive web front end or via a powerful API, into grouped spaces on the platform where they can retain their distinctive branding. It provides facilities for agencies to upload and review data before it goes live. It uses non-proprietary, open-source software, tried and tested on large existing projects such as datagm.org.uk, a data platform for the Greater Manchester area.

Data can be linked on external sites, or held as structured data on the CityData server, in which case a suite of visualisations and maps are available to users as well as an API to query the data. By making data from many different local sources discoverable and searchable, CityData encourages local app developers to build services using multiple data streams – for example, combining geospatial transport and house price data to make suggestions to a user who needs to find a place to live.

Living Labs Award Contact at OKFN: velichka.dimitrova [at] okfn.org

Translators needed!

February 10, 2012 in CKAN, Join us, OKF Projects, Open Data, Our Work, Releases

Do you speak another language apart from English? Have you got a little bit of spare time over the next week?

CKAN 1.6 is set to release in one week’s time and all the new features need translating. Can you help us complete it in time? If you can spend 15 minutes filling in the gaps using the Transifex website, then not only will community CKANs in your country benefit (e.g. Czech, Swedish, French etc), but so will the international CKANs run in your language! (e.g. thedatahub.org, datacatalogs.org, publicdata.eu)

These are the languages and how complete the translations are:

https://www.transifex.net/projects/p/ckan/resource/1-6/

Serbian 83%

Finnish 83% Norwegian 83%

Portuguese 83%

Italian 83%

Catalan 83%

French 83%

Polish 82% Czech 82%

German 80%

Spanish 76%

Swedish 74%

Hungarian 58%

Albanian 43%

Dutch 37%

Bulgarian 37%

Greek 27%

Slovenian 23%

It’s easy to do some translating!

First timers will need to setup their account first:

  1. Log-in with Transifex/Facebook/Twitter/Google account here.

  2. Choose a CKAN language team: https://www.transifex.net/projects/p/ckan/teams/

  3. Click “Join this team”

  4. Wait for me or another admin to approve you

Now to translate:

  1. https://www.transifex.net/projects/p/ckan/resource/1-6/

  2. Click on your language

  3. Press “Translate”.

Every day this week I’ll put the translations up on thedatahub.org for you to see the results. Please help make help make this open data catalogue readable by as many people as possible!

Open Knowledge Foundation’s CKAN Software to Power new European Commission Data Portal

January 31, 2012 in CKAN, News, OKF Projects, Open Data

[CKAN logo]

The European Commission is to make its data publicly and openly available through a new data portal, along the lines of those already used by national governments such as http://data.gov.uk/. Like http://data.gov.uk/ the new site will be based on the open-source CKAN Data Portal Software developed by the Open Knowledge Foundation.

The Foundation will also be one of the partners in the project to build the site; the project’s official press release is below. See also the announcement on the CKAN blog.


PRESS RELEASE – FOR IMMEDIATE RELEASE

TenForce, the Open Knowledge Foundation and InfAI to develop European Commission open data portal

Open data will encourage re-use, improving transparency, policy-making and growth

The European Commission (EC) has awarded a contract to create an open data portal website, where data produced by European Commission services will be freely available. Belgian company TenForce will lead the project to deliver the portal, supported by Leipzig University’s Institute for Applied Computer Science (InfAI), and UK-based non-profit the Open Knowledge Foundation.

Users will be able to search for information in a flexible range of ways, for example by subject area, country, and region, and to visualise the data or download it for re-use in research, campaigns or commercial applications. The EC and the contracted partners will run workshops and other outreach activities, to raise awareness of and interest in the data among companies, researchers, journalists and policy groups.

The site will be based on open source software components including Drupal and CKAN. CKAN is a powerful data portal software package written by the Open Knowledge Foundation; it is already used to catalogue freely-available data from a number of governments, both within and beyond the EU. As well as viewing or downloading the raw data, users will be able to view it by way of sophisticated graphic visualisations developed by InfAI. TenForce will be responsible for the overall management, the architecture of the portal, store deployment and taxonomy management and some of the integration work.

ENDS

NOTES FOR EDITORS

1 Background
On 12 December 2011 the European Commission presented an Open Data Strategy for Europe setting out clearer rules on making the best use of government-held information. The proposed Open Data Strategy will make it easier for business and citizens to find and re-use information held by public sector bodies in the Member States and by the Commission itself. Primarily, the Commission plans to update the 2003 Directive on the re-use of public sector information. The Commission has also updated its own re-use rules so as to make its data available in machine-readable format and to include data from research by the Joint Research Centre. In 2012 the Commission will launch a web portal making it easy for industry and citizens to search for Commission data. More information here and here.
2. TenForce
TenForce BVBA is a Belgian software company specialized in the design, development and delivery of practical solutions to complex problems. TenForce has years of international experience in knowledge management, and an in depth expertise in emerging technologies. Besides designing, marketing and supporting its flagship product – a web-based management environment for project and operational activities – it conducts several projects on a European scale focusing on modelling complex systems for publishing solutions. Contact: Bastiaan Deblieck, bastiaan.deblieck@tenforce.com , +32 16 31 48 60.
3. InfAI
InfAI is an institute of the University of Leipzig, one of the oldest (founded 1409) and largest (30.000 students) universities in Germany. InfAI hosts the world class Knowledge Engineering Research Group (http://aksw.org), which is establishing theoretical results and scalable implementations for the field of knowledge engineering. The group’s tools and services enjoy considerable popularity: the open-source Semantic Web framework OntoWiki, for example, is downloaded more than 500 times a month, and applied in cases ranging from creating biomedical ontologies to knowledge management for business. Contact: Sören Auer, auer@informatik.uni-leipzig.de
4. Open Knowledge Foundation
The Open Knowledge Foundation (OKF) is a not-for-profit organization founded in 2004, dedicated to promoting open knowledge in all its forms. It builds tools and communities with a network of international leaders in this field. Projects include CKAN, a data portal that powers the UK government’s http://data.gov.uk/ and the pan-European http://publicdata.eu/ and several dozen other government and community data sites around the world; and the OpenSpending — which maps government and corporate spending around the world. The Foundation runs forums, workshops and an annual conference drawing together representatives from across the knowledge society – from academics and public servants to entrepreneurs and web developers. Contact: Laura James, laura.james@okfn.org.

We’re hiring!

December 21, 2011 in CKAN, Join us, OKF

As we head into 2012, there’s lots going on at the OKFN and we’re looking for some more people to come help us build and scale the open data ecosystem.

In particular, we’re looking for a great project manager to deliver a portfolio of CKAN-related projects, and also an awesome front end web developer who will contribute to a range of OKFN projects.

Come join our team! Find out more at http://okfn.org/jobs/

Data = Seized, Sanitised and Sanity-checked. Open Data Day 2011

December 12, 2011 in CKAN, Open Spending

This post is by Mark Brough, Research Officer at Publish What You Fund, Lucy Chambers, Community Coordinator for OpenSpending, and Irina Bolychevsky, Product Owner for CKAN. It is cross-posted on the OpenSpending Blog and the CKAN blog and Mark Brough’s contribution is also featured on aidinfolabs.org.

Saturday, December 3rd was Open Data Day, and London took the challenge to throw a hackday to help data be opened, cleaned and shown off to the world…

Fuelled only by enthusiasm, caffeine and 5 packets of ready-made popcorn, the CKAN, OpenSpending and IATI teams, along with some new faces, joined forces to liberate as much data as they could…

OpenSpending + IATI + CKAN

As part of the IATI Open Data Day challenges, Mark Brough did some work to get the existing IATI Data into OpenSpending. David Read, from the CKAN team, and a new face to the data wrangling crew, Johannes, scraped data on aid donations from France and Austria that were locked-up in web apps in order to help fill in the gaps in the global aid data jigsaw puzzle.

These, along with many other datasets discovered on the day via tweets and emails have been added to the Open Data Day Group on theDataHub.org.

You can see the results of the IATI wrangling process on OpenSpending.org/iati. This following section is written by Mark.

1. Getting the data

Downloading the existing IATI data has already become quite a big task; with 19 publishers so far, the data currently amounts to over 750MB with 1169 packages. Fortunately this is made easier by the IATI Registry, which provides an API to access all existing datasets, and a simple script (links at end) can retrieve all of the data.

2. Extracting the data

Extracting the data from the XML files is more complicated. Although IATI data uses a standard schema, there are a few cases where publishers have either used the markup incorrectly, or else interpreted the definitions slightly differently. This can be simple problems such as stating that an organisation is “implementing” rather than “Implementing”, or placing the date within the text of the tag and not the “iso-date” attribute of that tag, or more significant problems such as placing implementing organisations in the “accountable” organisation field.

However, these problems are still fairly limited and follow fairly regular patterns, so they are not too hard to overcome. There are more significant problems when some donors have for example used three-letter (ISO-3) country codes, rather than two-letter (ISO-2) country codes. (This is considered below in “next steps”.)

3. Wrangling the data

OpenSpending is designed to show spending data, and has a powerful aggregation system to show large collections of transactions in a meaningful way. However, IATI data is organised by activities, with transactions nested within activities (projects), and – reflecting the business models of funders – activities sit within other activities (e.g., projects within programs), although they are not nested in the actual XML. Furthermore, one of the significant advantages of IATI compared to other aid data formats is that it permits multiple sectoral classifications, allowing you to assign a proportion of the value of an activity to each sector. So, you might have an activity that is 50% related to health and 50% to education.

To prepare the data for OpenSpending, each transaction inherits the properties of its activity (and, if that activity has a parent, that parent activity’s title and description). Then, the transaction is broken out into mini transactions, with the proportion of the activity assigned to each sector used to assign a proportion of the value of the transaction to each sector. So, from transactions, you get mini “sector-transactions”.

This takes about 40 minutes to compile, and then one final step remains: to convert the currencies to a single currency. Currently, USD, EUR and GBP amounts are used in the IATI data. All data is converted to USD using the average for 2010 from the OECD’s Financial Indicators (MEI) dataset. (This is also considered below in “next steps”.)

4. Loading the data

OpenSpending’s new web-based loading interface makes it relatively easy to load data in, although you currently also have to write a model and views (links at end).

Results

The results can be viewed in the OpenSpending IATI dataset. You can explore the data by recipient country, sectors, funding organisation, and drill down through the data to see the data for an individual country.

Problems with the data

So far I’ve noticed the following problems:

  • “Unknown” recipient location is incorrectly marked as “South Sudan”
  • Recipient countries are listed twice, as Spain has used ISO3 rather than ISO2 country codes.
  • Sweden is listed as “Ministry of Foreign Affairs” (this is how they have listed themselves as the Funding Organisation in the data)
  • Sweden’s implementing organisations have been lost as they placed them in the accountable organisation field.

Please let me know if you see anything else problematic, if you have and criticisms of feedback of the way the data has been presented, or if you think there are other ways you’d like to be able to explore the data, based on the available dimensions.

Next steps

As mentioned above, there are some problems with the data which should properly be dealt with at the level of the donor agency. But there are others that will probably have to be dealt with by users of the data:

  • Mapping between different sector vocabularies, so that you can see all “Health” projects, and not only the health projects according to a single vocabulary
  • Mapping between countries and regions, so that every project in a country has a related region
  • Correctly converting currencies using the “value-date” column to get a more precise (at least month-specific) conversion.

What else have you noticed with the data? Is there anything else that should be changed? Anything interesting?

You can contact Mark about this data via the OpenSpending mailing list

Useful Links

Major new CKAN release: v1.5!

November 14, 2011 in CKAN, News, OKF Projects, Open Data, Technical

The following post is by David Read, on behalf of the CKAN team.

We’re proud to announce a major new release of CKAN!

Version 1.5 brings major improvements including:

  • Major user experience upgrades around dataset publication and access plus a new theme
  • Integrated structured and blob data storage, with associated with data previewing and visualization
  • Extended catalog API providing ability to access every piece of the CKAN system
  • Documentation overhaul and extension including a new administrator and development manual at http://docs.ckan.org/
  • Easier installation and deployment, specifically via new debian / ubuntu packages of CKAN — CKAN installation and deployment can now be less than 5 minutes

CKAN, the Open Knowledge Foundation’s data hub and catalogue software, has now been deployed in over 20 countries around the world, providing Open Data hubs for governments and communities. Started five years ago, CKAN has gradually increased in momentum – the development team is now at 6 full-time developers. Originally developed to power the community data site http://thedatahub.org/ (previously http://ckan.net) which can be freely used by anyone in the open data communities, the Foundation has also now been involved in assisting governments and other public organisations make use of CKAN through the development of customisations and new features as well as the provision of hosted solutions.

For more about this release, see: http://ckan.org/2011/11/09/ckan-1-5-release/

We’d also like to take this opportunity to thank the amazing on-line community around CKAN for their continued ideas, suggestions and support in the development of this open source data hub software.

Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates