The 2015 Global Open Data Index is around the corner – these are the new datasets we are adding to it!
After a two months, 82 ideas for datasets, 386 voters, thirteen civil society organisation consultations and very active discussions on the Index forum, we have finally arrived at a consensus on what datasets will be including in the 2015 Global Open Data Index (GODI).
This year, as part of our objective to ensure that the Global Open Data index is more than a simple measurement tool, we started a discussion with the open data community and our partners in civil society to help us determine which datasets are of high social and democratic value and should be assessed in the 2015 Index. We believe that by making the choice of datasets a collaborative decision, we will be able to raise awareness of and start a conversation around the datasets required for the Index to truly become a civil society audit of the open data revolution. The process included a global survey, a civil society consultation and a forum discussion (read more in a previous blog post about the process).
The community had some wonderful suggestions, making deciding on fifteen datasets no easy task. To narrow down the selection, we started by eliminating the datasets that were not suitable for global analysis. For example, some datasets are collected at the city level and can therefore not be easily compared at a national level. Secondly, we looked to see if there is was a global standard that would allow us to easily compare between countries (such as UN requirements for countries etc). Finally, we tried to find a balance between financial datasets, environmental datasets, geographical datasets and datasets pertaining to the quality of public services. We consulted with experts from different fields and refined our definitions before finally choosing the following datasets:
- Government procurement data (past and present tenders) – This dataset is crucial for monitoring government contracts be it to expose corruption or to ensure the efficient use of public funds. Furthermore, when combined with budget and spending data, contracting data helps to provide a full and coherent picture of public finance. We will be looking at both tenders and awards.
- Water quality -Water is life and it belongs to all of us. Since this is an important and basic building stone of society, having access to data on drinking water may assist us not only in monitoring safe drinking water but also to help providing it everywhere.
- Weather forecast – Weather forecast data is not only one of the most commonly used datasets in mobile and web applications, it is also of fundamental importance for agriculture and disaster relief. Having both weather predictions and historical weather data helps not only to improve quality of life, but to monitor climate change. As such, through the index, we will measure whether governments openly publish data both data on the 5 day forecast and historical figures.
- Land ownership – Land ownership data can help citizens understand their urban planning and development as well as assisting in legal disputes over land. In order to assess this category, we are using national cadastres, a map showing land registry.
- Health performance data – While this was one of the most popular datasets requested during the consultation, it was challenging to define what would be the best dataset(s) to assess health performance (see the forum discussion). We decided to use this category as an opportunity to test ideas about what to evaluate. After numerous discussions and debates, we decided that this year we would use the following as proxy indicators of health performance:
- Location of public hospitals and clinics.
- Data on infectious diseases rates in a country.
In addition to the new datasets, we refined the definitions to some of the existing datasets, while using our new datasets definition guidelines. These were written in order to both produce a more accurate measurement and to create more clarity about what we are looking for with each dataset. The guidelines suggest at least 3 key data characteristics for each datasets, define how often each dataset needs to be updated in order to be considered timely, and suggests level aggregation acceptable for each datasets. The following datasets were changed in order to meet the guidelines:
Elections results – Data should be reported at the polling station level as to allow civil society to monitor elections results better and uncover false reporting. In addition, we added indicators such as number of registered voters, number of invalid votes and number of spoiled ballots.
National map – In addition to the scale of 1:250,000, we added features such as – markings of national roads, national borders, marking of streams, rivers, lakes, mountains.
Pollutant emissions – We defined the specific pollutants that should be included in the datasets.
National Statistics – GDP, unemployment and populations have been selected as the indicators that must be reported.
Public Transport – We refined the definition so it will examine only national level services (as opposed to inter cities ones). We also do not looking for real time data, but time tables.
Location datasets (previously Postcodes) – Postcode data is incredibly valuable for all kinds of business and civic activity; however, 60 countries in the world do not have a postcode system and as such, this dataset has been problematic in the past. For these countries, we have suggested examining a different dataset, administrative boundaries. While it is not as specific as postcodes, administrative boundaries can help to enrich different datasets and create better geographical analysis.
Adding datasets and changing definitions has been part of ongoing iterations and improvements that we have done to the Index this year. While it has been a challenge, we are hoping that these improvements help to create a more fair and accurate assessment of open data progress globally. Your feedback plays an essential role in shaping and improving the Index going forward, please do share it with us.
For the full descriptions of this year’s datasets can be found here.