Your input needed: final review of 2015 Global Open Data Index

We’re now in the final stretch for the 2015 Global Open Data Index, and will be publishing the results in the very near future! As a community driven measurement tool, this year we have incorporated feedback we’ve received over the past several years to make the Index more useful as an instrument for civil society — particularly around what data should be measured and what attributes are important for each dataset.

As a crowdsourced survey, we have taken extra steps to ensure the measurement instrument is more reliable. We are are aware that there is no perfect measurement that can be applied globally, but we aim to be as accurate as we possibly can. We have documented our processes year to year and inevitably not everything has been perfect, but by engaging in this process of experimentation, trial and error we hope the Global Open Data Index will continue to evolve as an innovative, grassroots, global tool for civil society to measure the state of open data.

The journey this year was long, but productive. Here is a recap of the steps we have taken in the long road to publishing the 2015 Index:

Global consultation on new datasets — We sought your opinions and ideas for new themes that are important for civil society which should be added to the Index. As a result of this initiative, we have added 4 new datasets to this year’s Index, including: Government procurement tenders, Water Quality, Land Ownership and Weather forecast.
Consultation on methodology —The Index team refined the definitions of the datasets based on feedback from open data advocates, researchers and communities from around the world. We have tightened the definitions of the datasets to allow for greater accuracy and comparability.
Submissions phase – The crowdsourced phase where submissions are made to the Index with the help of the great index community and our new local index coordinators.
Quality Assurance of the data — We added a preliminary stage of QA this year to conduct a systematic review of the license and machine readable questions — the two attributes that have given past submitters the most trouble.
Thematic review with experts — This year, instead of conducting reviews of complete submissions by country or regional reviewers, we deployed expert thematic reviewers. Thematic reviewers assessed the submissions of all entries for a given dataset, and made sure that we are comparing the right datasets to one another between all 120 places included in this year’s Index, and that they were compliant with the new definition we made for each dataset .

Now, we are in the final phase of assessing the submissions for this year’s Index. After conducting a lengthy review phase, we seek your help to understand if we have evaluated the submissions correctly before finalizing the Index and publishing this year’s scores. In the next two weeks, from today until November 6, we will open the Index again to your comments. We encourage everyone to comment on the Index, civil society and governments alike.

Before you comment on a submission, note that we allowed thematic reviewers to apply their own logic to their review based on their expertise and assessment of the entire body of submissions across all places. This logic was grounded in the published definitions for each dataset, but allowed for some subjective flexibility in order to maintain a consistent review and account for the challenges faced by submitters, particularly in the cases of the datasets that were added this year and those with substantial changes to their definitions. Please read this section carefully before commenting on submissions. Note two things:

After careful consideration, we’ve omitted two datasets from the final scoring of the 2015 Index — public transport and health performance. We omitted public transport because 45 countries do not have a national level public transport system, which accounts for 37% of the Index sample. This does not allow an equal comparison between places. We omitted health performance data because we asked for two different datasets, and could record only one dataset faithfully in the Index system and as such it was almost impossible to score any of these entries as a unified submission. In both cases we will review the data and make it available for further investigation, and will see how we can make adjustments and incorporate these important datasets into future indexes.
In some places, our reviewers could not complete their evaluation and needed more information. We would appreciate if you can help provide more information on any of these submissions. Any entry that displays a number ‘1’ in an orange circle on it needs further attention.

Here is a summary of the reviewers approaches to evaluating submissions for each dataset included in the 2015 Index:

Government Budget

Reviewer: Mor Rubinstein

The stated description of the Government Budget dataset is as follows:

National government budget at a high level. This category is looking at budgets, or the planned government expenditure for the upcoming year, and not the actual expenditure. To satisfy this category, the following minimum criteria must be met:
Planned budget divided by government department and sub-department
Updated once a year.
The budget should include descriptions regarding the different budget sections.

Submissions that included data for both department AND sub-department/program were accepted. Submissions that included only department level data were not accepted. Additionally, budget speeches that did not include detailed data about the the estimated expenditures for the coming year were not accepted as a submission. Only datasets from an official source (e.g. The Ministry of Finance or equivalent agency) were accepted.

Government Spending

Reviewer: Tryggvi Björgvinsson

The stated description of the Government Spending dataset is as follows:

Records of actual (past) national government spending at a detailed transactional level; A database of contracts awarded or similar will not considered sufficient. This data category refers to detailed ongoing data on actual expenditure. Data submitted in this category should meet the following minimum criteria:
Individual record of transactions.
Date of the transactions
Government office which had the transaction
Name of vendor
amount of the transaction
Update on a monthly basis

Submissions that included aggregate data or simply procurement contracts (results of calls for tenders) were not accepted. In cases where aggregate data or procurement data was submitted or the submitter claimed that the data did not exist, an attempt was made to locate transactional data with a simple Google search and/or via IBP’s Open Budget Survey. If data was available for the previous year (or applicable recent budget cycle) the submission was adjusted accordingly and accepted.

Election Results

Reviewer: Kamil Gregor

The stated description of the Election Results dataset is as follows:

This data category requires results by constituency / district for all major national electoral contests. To satisfy this category, the following minimum criteria must be met:
Result for all major electoral contests
Number of registered votes
Number of invalid votes
Number of spoiled ballots
All data should be reported at the level of the polling station

Submissions that did not show the data at polling station level were omitted and marked as ‘Data does not exist’, even if votes are not counted at polling station level as a matter of policy. The reason for this is the polling station level is the most granular level that allow to monitor election fraud .

Company Register

Reviewer: Rebecca Sentance

The stated description of the Company Register dataset is as follows:

List of registered (limited liability) companies. The submissions in this data category does not need to include detailed financial data such as balance sheet etc. To satisfy this category, the following minimum criteria must be met:
Name of company
Unique identifier of the company
Company address
Updated at least once a month

Data was marked as unsure if it exists when the submitted dataset did not contain address or a company ID. If the submission referenced a relevant government website that does not indicate the data exists, or if there is no evidence even which government body would hold the data, the submission was changed to ‘data does not exist’.. If it is clear that a governmental body collects company data, but there is no way of knowing what it consists of, where it is held, or how to access it, and no indication that it would fulfil our requirements, the submission was also marked as ‘data does not exist’.

Based on the definition, it was decided that a company register that is freely available to searchable by the public but requires entering a search term (search applications) did not count as free or publicly accessible. However, a company register that can be browsed through page-by-page does present all of the data and is the type of dataset required for acceptance.

National Statistics

Reviewer: Zach Christensen

The stated description of the National Statistics dataset is as follows:

Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc). To satisfy this category, the following minimum criteria must be met:
GDP for the whole country updated at least quarterly
Unemployment statistics updated at least monthly
Population updated at least once a year

For each submission, the reviewer checked for national accounts, unemployment, and population data as required by the description. It was found that most countries don’t have these data for the last year and very few had quarterly GDP figures or monthly unemployment figures. Submissions were only marked as ‘data does not exist’ if they did not have any national statistics more recent than 2010.

Legislation

Reviewer: Kamil Gregor

The stated description of the Legislation dataset is as follows:

This data category requires all national laws and statutes available to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available. To satisfy this category, the following minimum criteria must be met:
Content of the law / status
If applicable, all relevant amendments to the law
Date of last amendments
Data should be updated at least on quarterly

Submissions were reviewed to ensure the data met the criteria. Regularity of updating was assessed based on the date of the most recently submitted data.

Pollutant Emissions

Reviewer: Yaron Michl

The stated description of the Pollutant Emissions dataset is as follows:

Aggregate data about the emission of air pollutants especially those potentially harmful to human health (although it is not a requirement to include information on greenhouse gas emissions). Aggregate means national-level or available for at least three major cities. In order to satisfy the minimum requirements for this category, data must be available for the following pollutants and meet the following minimum criteria:
Particulate matter (PM) Levels
Sulphur oxides (SOx)
Nitrogen oxides (NOx)
Volatile organic compounds (VOCs)
Carbon monoxide (CO)
Updated on at least once a week.
Measured either at a national level by regions or at leasts in 3 big cities.

VOCs is a generic designation for many organic chemicals, therefore, when measuring VOCs it is possible to measure any one of a number of compounds such as Benzene or MTBE. Measurements of Volatile Organic compounds(VOCs) was ultimately not considered as part of the data requirements because of this discrepancy and the fact that it is rarely measured on a national level (see this link).

Carbon monoxide (CO) and Nitrogen Oxides (NoX) were also not considered as a requirement because their main origin is usually from transportation.

In addition, some countries publish air pollution by using the Air Quality Index, a formula that translates air quality data into numbers and colors to help citizens understand when to take action to protect their health. Submissions that relied on the Air Quality Index was considered not to exist because it is not raw data.

Government Procurement Tenders

Reviewer: Georg Neumann

The stated description of the Government Procurement Tenders dataset is as follows:

All tenders and awards of the national/federal government aggregated by office. Monitoring tenders can help new groups to participate in tenders and increase government compliance. Data submitted in this category must be aggregated by office, updated at least monthly & satisfy the following minimum criteria:
Tenders:
tenders name
tender description
tender status
Awards:
Award title
Award description
value of the award
suppliers name

Quality of published information varied strongly and was not evaluated here. As long as the minimum information was available the data was said to exist for a given place.

Thresholds for publication of this information varies strongly by country. For all EU countries, tenders above a specific amount, detailed here, need to be published. This allowed for all EU submissions to qualify as publishing open procurement data even though some countries, such as Germany, do not publish award value for contracts below those thresholds, and others have closed systems to access specific information on contracts awarded.

In other countries not all sectors of government publish tenders and awards data. Submissions were evaluated to ensure that the main government tenders and contracts were made public, notwithstanding that data from certain ministries may have been missing.

Water Quality

Reviewer: Nisha Thompson

The stated description of the Water Quality dataset is as follows:

Data, measured at the water source, on the quality of water is essential for both the delivery of services and the prevention of diseases. In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals by water source and be updated at least weekly:
fecal coliform
arsenic
fluoride levels
nitrates
TDS (Total dissolved solids)

If a country treats water or distributes it, then there will be data regarding water quality because all water treatment requires quality checks. Even though water quality is a local responsibility in most countries, very few countries have a completely decentralized system. Usually there is a monitoring role by the central government, either by the Environmental Protection Agency, Ministry of the Environment or the Ministry of Public Health. If there is monitoring role, the data does exist, if monitoring is completely decentralized, like in the UK, the submission was marked as ‘does not exist’ because there is no aggregation of the data. If data was not available daily or weekly it wasn’t considered timely.

In some cases, all the parameters were accounted for except TDS. Even though it is standard, some countries only collect conductivity, which can be used to calculate TDS. In this case, the submission was approved as is.

Land Ownership

Reviewer: Codrina Maria Ilie

The stated description of the Land Ownership dataset is as follows:
Cadaster showing land ownership data on a map and include all metadata on the land. Cadaster data submitted in this category must include the following characteristics:
Land borders
Land owners name
Land size
national level
updated yearly
For various reasons, the land owner’s name attribute was widely unmet and as such, lack of this data was not considered a factor in evaluating these submissions.
As this dataset is subject to well-kept historic records (not always the case), to legislation (which can be fluctuant), to very expensive activities that a government must implement in order to keep data up to date, to the complexity of the data itself (sometimes data that makes a national cadastre is registred in different registries or systems), a first year indexing exercise must not be considered exhaustive.

Weather

Reviewers: Neal Bastek & Stephen Gates

The stated description of the Weather dataset is as follows:

5 days forecast of temperature, precipitation and wind as well as recorded data for temperature, wind and precipitation for the past year. In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria:
5 days forecast of temperature updated daily
5 days forecast of wind updated daily
5 days forecast of precipitation updated daily
Historical temperature data for the past year

Based on a general assessment of the submissions, a minimum threshold for claiming the data existed was set at forecast data for today + two days (three days) with a qualitative allowance made for arid regions substituting humidity data for precipitation data. The threshold for inclusion could also be met with four day forecasts that include temperature and precipitation data, and/or a generic statement using text or descriptive icons about conditions (e.g. windy, stormy, partly cloudy, sunny, fair, etc.).

Location

Reviewer: Codrina Maria Ilie

The stated description of the Location dataset is as follows:

A database of postcodes/zipcodes and the corresponding spatial locations in terms of a latitude and a longitude (or similar coordinates in an openly published national coordinate system). If a postcode/zipcode system does not exist in the country, please submit a dataset of administrative borders. Data submitted in this category must satisfy the following minimum conditions
Zipcodes
Address
Coordinate (latitude longitude)
national level
updated once a year
Administrative boundaries
Boarders poligone
name of poligone (city, neighborhood)
national level
updated once a year

In cases in which a country has not adopted a postcode system, the location dataset is considered to be administrative boundaries. The Universal Postal Union – Postal Addressing System was used to identify the structure of a postcode for a given place [http://www.upu.int/en/activities/addressing/postal-addressing-systems-in-member-countries.html]. This tool proved significantly useful when identifying countries that do not use a postcode system.

In situations where countries only had a postcode search service, either by postcode or address, data was said to not exist. If the postcodes were not geocoded, submissions did not meet the Index requirements due to the difficulty of geocoding such a dataset. On the other hand, if the postcode system took into account just the smallest administrative boundary and if that boundary was officially available, considering the easiness of obtaining the geocoded postcodes number, the data was marked as ‘does exist’ for that submission.

National Map
Reviewer: Gil Zaretzer

The stated description of the National Map dataset is as follows:

This data category requires a high level national map. To satisfy this category, the following minimum criteria must be met:
Scale of 1:250,000 (1 cm = 2.5km).
Markings of national roads
National borders
Marking of streams, rivers, lakes, mountains.
Updated at least once a year.

Only submissions from an official source, with original data was considered. A link to Google Maps, which was often provided, does not satisfy the criteria for these submissions.

In cases where there was no link provided in the submission and entries were marked as “unsure” if there was any indication that the data exists but was not available online, i.e. a national mapping service without a website.