The Global Open Data Index (GODI) is a tool to educate civil society and governments about open government data publication. We do so through presenting different information, including places scores, ranking, and scores for each data category per place, and comments by our submitters and reviewers.
Even though we try to make this assessment coherent and transparent as possible, interpreting the results is not always straightforward. While open data has a very strict definition, scoring of any index is a discretional action. In real life, you can’t be partly open – either the data fit the criteria, or they do not.
This blog post will help GODI user to understand the following:
– What does the final score mean?
– How to interpret scores that vary between 0%, 40% or 70%?
– What does a score of 0% mean?
For a more thorough explanation on how to read the results, go to index.okfn.org/interpretation/
What does the score mean?
Our scoring (ranging from 0% open to 100% open) does not necessarily show a gradual improvement towards open data. In fact, we assess very different degrees of data openness – which is why any score below 100 percent only indicates that a dataset is partially open. These levels of openness include public data, access-controlled data, as well as data gaps (See GODI methodology). To understand the differences we highly recommend reading each score together with our openness icon bar (see image below).
For instance: a score of 70% can say that we found access-controlled, machine-readable data, that cannot be downloaded in bulk. **Any score below 100% means “no access”, “closed access” or “public access”**. Here we explain what each of them means, and how the data for each category look in practice.
Public Access Data
Data is publicly accessible if the public can see it online without any access controls. It does not imply that data can be downloaded, or that it is freely reusable. Often it means that data is presented in HTML on a website.
The image above shows a search interface of a company register. It allows for targeted searches for individual companies but does not enable to retrieve all data at once. Individual search results (non-bulk) are displayed in HTML format and can then be downloaded as PDF (not machine-readable). Therefore the score is 70% and visualised as follow openness icon bar in our ranking:
Data is access-controlled if a provider regulates who, when, and how data can be accessed. Access control includes:
* Data request forms, data sharing agreement (stipulating use cases),
* Ordering/purchasing data.
There are many reasons for establishing access controlled data including website traffic management, or to maintain control over how data is used. It is debatable whether some registration/authentication mechanisms reduce the openness of data (especially when registration is automated). Data request forms, on the other hand, are clearly not open data.
This image shows a data request form. The dataset is entirely hidden behind a “paywall”. This often prevents our research team from assessing the data at all. In this case, we could not verify in which format the data will be provided, and neither whether the data are actually weather forecast data (the particular category we look at). Therefore this access-controlled data gained 0 points and counts as 0% open. By contrast, access-controlled data often score very high, up to 85% (because we subtract 15 out of 100 points for access-controls like registration requirements).
How to read a score of 0%?
The are many reasons why datasets will score 0%. We tried to address the reasons in the reviewer or submitter’s comments as well. See here for the main reasons:
A data gap can mean that governments do not produce any data in a given category. Sometimes, if GODI shows a zero percent score, we see data gaps. For instance the case for Western African countries that lack air quality monitoring systems, or countries that have no established postcode system. Data gaps indicate that the government information systems are not ready to produce open data, sometimes because resources are missing, at times because it is not a priority of government.
Exist, but only to governmental use
Sometimes government has the data, but for many reasons choose not to open it to the public at all.
Since our criteria look for particular granularity, we considered all datasets that didn’t reach this granularity levels as not granular, and therefore they were regarded as not available.
For example – Great Britain has published elections results, but not on poll station level, which is a crucial level to detect voter fraud. Therefore, while there is some data for UK elections, it is not at the right level and considered as non-existent.
Do not fit our criteria
We are looking for particular datasets in GODI. When they don’t have all the characteristics we are looking for, we consider them as not available.
For the full explanation on how to read the results see – index.okfn.org/interpretation/
Danny Lämmerhirt works on the politics of data, sociology of quantification, metrics and policy, data ethnography, collaborative data, data governance, as well as data activism. You can follow his work on Twitter at @danlammerhirt. He was research coordinator at Open Knowledge Foundation.