Support Us

You are browsing the archive for OKI Projects.

Global Open Data Index Insights – Open Data in the Arab world

Open Knowledge International - April 20, 2016 in Global Open Data Index

This blog post was written by Riyadh Al Balushi from the Sultanate of Oman.

I recently co-authored with Sadeek Hasna a report that looks at the status of open data in the Arab World and the extent to which governments succeed or fail in making their data available to the public in a useful manner. We decided to use the results of the Global Open Data Index as the starting point of our research because the Index covered all the datasets that we chose to examine for almost all Arab countries. Choosing to use the Global Open Data Index as a basis for our paper saved us time and provided us with a systematic framework for evaluating how Arab countries are doing in the field of open data.

We chose to examine only four datasets, namely: the annual budget, legislation, election results, and company registration data. Our selection was driven by the fact that most Arab countries already have published data in this area and therefore there is content to look at and evaluate. Furthermore, most of the laws of the countries we examined make it a legal obligation on the government to release these datasets and therefore it was more likely for the government to make an effort to make this data public.

Our analysis uncovered that there are many good examples of government attempts at releasing data in an open manner in the Arab World. Examples include the website of Ministry of Finance of the UAE which releases the annual budget in Excel format, the legislation website of Qatar which publishes the laws in text format and explicitly adopts a Creative Commons license to the website, the Elections Committee website of Egypt, which releases the elections data in Excel format, and the website of the Company Register of Bahrain, which does not make the data directly available for download, but provides a very useful search engine to find all sorts of information about companies in Bahrain. We also found several civil society projects and business initiatives that take advantage of government data such as Mwazna – a civil society project that uses the data of the annual budget in Egypt to communicate to the public the financial standing of the government in a visual way, and Al Mohammed Network – a business based on the legislation data in the Arab World.

“Map of Arabic-speaking countries”

“Map of Arabic-speaking countries” by Illegitimate Barrister – Licensed under CC Attribution 3.0.

What was interesting is that even though many Arab countries now have national open data initiatives and dedicated open data portals, all the successful open data examples in the Arab World are not part of the national data portals and are operated independently by the departments responsible for creating the data in question. While the establishment of these open data portals is a great sign of the growing interest in open data by Arab governments, in many circumstances these portals appear to be of a very limited benefit, primarily because the data is usually out of date and incomplete. For example, the Omani open data portal provides population data up to the year 2007, while Saudi’s open data portal provides demographic data up to the year 2012. In some cases, the data is not properly labeled, and it is impossible for the user to figure out when the data was collected or published. An example of this would be the dataset for statistics of disabilities in the population on the Egyptian government open data page. The majority of the websites seem to be created through a one-off initiative that was never later updated, probably in response to the global trend of improving e-government services. The websites are also very hard to navigate and are not user-friendly.

Another problem we noticed, which applies to the majority of government websites in the Arab World, is that very few of these websites license their data using an open license and instead they almost always explicitly declare that they retain the copyright over their data. In many circumstances, this might not be in line with the position of domestic copyright laws that exempt official documents, such as the annual budget and legislation, from copyright protection. Such practices confuse members of the public and give the impression to many that they are not allowed to copy the data or use it without the permission of the government, even when that is not true. Another big challenge for utilising government data is that many Arab government websites upload their documents as scanned PDF files that cannot be read or processed by computer software. For example, it is very common for the annual budget to be uploaded as a scanned PDF file when instead it would be more useful to the end user if it was uploaded in a machine-readable format such as Excel or CSV. Such formats can easily be used by journalists and researchers to analyse the data in more sophisticated ways and enables them to create charts that help present the data in a more meaningful manner. Finally, none of the datasets examined above were available for download in bulk, and each document had to be downloaded individually. While this may be acceptable for typical users, those who need to do a comprehensive analysis of the data over an extensive period of time will not be able to do efficiently so. For example, if a user wants to analyse the change in the annual budget over a period of 20 years, he or she would have to download 20 individual files. A real open data portal should enable the user to download the whole data in bulk. In conclusion, even though many governments in the Arab World have made initiatives to release and open their data to the public, for these initiatives to have a meaningful impact on government efficiency, business opportunities, and civil society participation, the core principles of open data must be followed. There is an improvement in the amount of data that governments in the Arab World release to the public, but more work needs to be done. For a detailed overview of the status of open data in the Arab World, you can read our report in full here.

Sloan Foundation Funds Frictionless Data Tooling and Engagement at Open Knowledge

Open Knowledge International - February 29, 2016 in Frictionless Data, News, open knowledge

We are excited to announce that Open Knowledge International has received $700,000 in funding from The Alfred P. Sloan Foundation over two years to work on a broad range of activities to enable better research and more effective civic tech through our Frictionless Data initiative. The funding will target standards work, tooling, and infrastructure around “data packages” as well as piloting and outreach activities to support researchers and civic technologists in addressing real problems encountered when working with data.

The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grant-making institution based in New York City. Established in 1934 by Alfred Pritchard Sloan Jr., then-President and Chief Executive Officer of the General Motors Corporation, the Foundation makes grants in support of original research and education in science, technology, engineering, mathematics and economic performance.  

“Analyzing and working with data is a significant (and growing) source of pain for researchers of all types”, says Josh Greenberg, Program Director at the Alfred P. Sloan Foundation. “We are excited to support Open Knowledge International in this critical area. This support will help data-intensive researchers to be more efficient and effective.”

What is being funded?

The funding will support three key streams of work around data packages: (a) the further development of the data package suite of standards, (b) the creation and enhancement of a suite of tools and integrations around these standards, and (c) broad outreach and engagement to educate researchers about the benefits of this approach.



The Data Package standard is a simple, lightweight specification for packaging all types of data, but we have a special emphasis on tabular (e.g. CSV) data. As the sources of useful data grow, effective data-driven research is becoming more and more critical. Such research often depends on cleaning and validating data, as well as combining such data from multiple sources, processes that are still frequently manual, tedious, and error-prone.  Data packages allow for the greater automation of these processes, thereby eliminating the “friction” involved.  

Tooling and Integration

A key aspect of this work is that it aligns with researchers’ usual tools and will require few or no changes to existing data and data structures.  To do this, we are seeking to build and support integrations with popular tools for research, for example, R, STATA, LibreOffice, etc.  In addition, we are looking to define ways of seamless translating datasets to and from typical file formats used across various research communities such as HDF5, NetCDF, etc.

Community Outreach

While our core mission is to design a well defined set of specifications and build a rich and vibrant ecosystem of tooling around them, none of this is possible without also building a broad awareness of data packages, where to use them and their utility, and a sustainable group of engaged users to support this.  To make our work in this area as effective as possible, we are building partnerships with organizations in research, civic tech, as well as government.

Be a part of the Frictionless Data future

We are looking to discover much more about the needs of different research groups and to identify the problems they might currently have.  To do this, we are running targeted pilots to trial these tools and specifications on real data.

Are you a researcher looking for better tooling to manage your data?  

Do you work at or represent an organization working on issues related to research data like DataCite, DataONE, RDA, or CODATA and would like to work with us on complementary issues for which data packages are suited?

Are you a developer and have an idea for something we can build together?

Are you a student looking to learn more about data wrangling, managing research data, or open data in general?

If any of the above apply to you, email us at  We’d love to hear from you.  If you have any other questions or comments about this initiative, please visit this topic in our forum: or hashtag #frictionlessdata. 

Unlocking Election Results Data: Signs of Progress but Challenges Still Remain

Open Knowledge International - December 24, 2015 in Global Open Data Index

This blog post was written by the NDI election team -Michael McNulty and Benjamin Mindes

How “open” are election results data around the world? Answering that question just became much easier. For the first time, the Global Open Data Index 2015 assessed election results data based on whether the results are made available at the polling station level. In previous years, the Index looked at whether election results were available at a higher (constituency/district) level, but not at the polling station level.

As a result, the 2015 Global Open Data Index provides the most useful global assessment to date on which countries are and are not making election results available in an open way. It also highlights specific open data principles that most countries are meeting, as well as principles that most countries are not meeting. This helps inform the reform agenda for open election data advocates in the months and years ahead.

Before we take a look at the findings and possible ways forward, let’s first consider why the Global Open Data Index’s shift from constituency/district level results to polling station results is important. This shift in criteria has shaken up the rankings this year, which has caused some discussion about why polling station-level results matter. Read on to find out!

Why are Polling Station-level Election Results Important?

Meets the open data principle of “granularity”

A commonly accepted open data principle is that data should be made available at the most granular, or “primary,” level — the level at which the source data is collected. (See the 8 Principles of Open Government Data principle on Primary; and the G8 Open Data Charter section on Quality and Quantity.) In the case of election results, the primary level refers to the location where voters cast their ballots — the polling station. (See the Open Election Data Initiative section on Granularity. Polling stations are sometimes called precincts, polling streams, or tables depending on the context) So, if election results are not counted at the polling station level and/or only made available in an aggregate form, such as only at the ward/constituency/district level, then that dataset is not truly open, since it does not meet the principle of granularity. (See the Open Election Data Initiative section on Election Results for more details.)

Promotes transparency and public confidence

Transparency means that each step is open to scrutiny and that there can be an independent verification the process. If results aren’t counted and made public at the polling station level, there is a clear lack of transparency, because there is no way to verify whether the higher-level tabulated results can be trusted. This makes election fraud easier to conceal and mistakes harder to catch, which can undermine public confidence in elections, distort the will of the voter, and, in a close election, even change the outcome.

For example, let’s imagine that a tabulation center is aggregating ballots from 10 polling stations. Independent observers at two of those polling stations reported several people voting multiple times, as well as officials stuffing ballot boxes. If polling station results were made available, observers could check whether the number of ballots cast exceeds the number of registered voters at those polling stations, which would support the observers’ findings of fraud. However, if polling station level results aren’t made available, the results from the two “problem” polling stations would be mixed in with the other eight polling stations. There would be no way to verify what the turnout was at the two problem polling stations, and, thus, no way to cross-check the observers’ findings with the official results.

Reduces tension

Election observers can combine their assessment of the election day process with results data to verify or dispel rumors at specific polling stations, but only if polling station-level results are made public.

Bolsters public engagement
When voters are able to check the results in their own community (at their polling station), it can help build confidence and increase their engagement and interest in elections. Also, civil society groups, political parties and candidates can use polling station-level turnout data to more precisely target their voter education and mobilisation campaigns during the next elections.

Aligns with an emerging global norm

Making results available at the polling station level is rapidly becoming a global norm. In most countries, political parties, election observers, the media, and voters have come to expect nothing less than for polling station level results to be posted publicly in a timely way and shared freely.

The 2015 Open Data Index shows how common this practice has become. Of the 122 countries studied, 71 of them (58%) provide election results (including results, registered voters, and number of invalid and spoiled ballots) at the polling station level. There are some significant differences across regions, however. Sub-Saharan Africa and Asia had the lowest percentage of countries that provide polling station level results data (42% and 41% respectively). Eastern Europe and Latin America have the highest percentage of countries with 71% each.

What Does the Index Tell Us about How to Open Up and Use Election Data?

Drawing on the 2015 Global Open Data Index findings and on open election data initiatives at the global, regional and national levels, we’ve highlighted some key priorities below.

1. Advocacy for making polling-station level results publicly available

While most countries make polling-station level results available, over 40% of the 112 countries researched in the Global Open Data Index still do not. At a regional level, Sub Saharan Africa, Asia and the Middle East & North Africa have the furthest to go.

2. Ensuring election results data is truly open

Making election data available is good first step, but it can’t be freely and easily used and redistributed by the public if it is not truly “open.” Election data is open when it is released in a manner that is granular, available for free online, complete and in bulk, analyzable (machine-readable), non-proprietary, non-discriminatory and available to anyone, license-free and permanent. Equally as important, election data must be released in a timely way. For election results, this means near real-time publication of provisional polling station results, with frequent updates.

The Global Open Data Index assesses many of these criteria, and the 2015 findings help highlight which criteria are more and less commonly met across the globe. On the positive side, of the 71 countries that make polling-station level results available, nearly all of them provide the data in a digital (90%), online (89%), public (89%) and free (87%) manner. In addition, 92% of those 71 countries have up-to-date data.

However, there are some significant shortcomings across most countries. Only 46% of the 71 countries provided data that was analyzable (machine readable). Similarly, only 46% of countries studied provided complete, bulk data sets. Western Europe (67%) had the highest percentage of countries providing complete, bulk data, while Middle East & North Africa and Sub Saharan Africa (both 38%) had the lowest percentage of countries doing so.

3. Not just election results: Making other types of election data open

While election results often get the most attention, election data goes far beyond results. It involves information relating to all aspects of the electoral cycle, including the legal framework, decisionmaking processes, electoral boundaries, polling stations, campaign finance, voter registry, ballot qualification, procurement, and complaints and disputes resolution. All of these categories of data are essential assessing the integrity of elections, and open data principles should be applied to all of them.

4. Moving from transparency to accountability

Opening election data helps make elections more transparent, but that’s just the beginning. To unlock the potential of election data, people need to have the knowledge and skills to analyze and use the data to promote greater inclusiveness and public engagement in the process, as well as to hold electoral actors, such as election management bodies and political parties, accountable. For example, with polling station data, citizen election observer groups around the world have used statistics to deploy observers to a random, representative sample of polling stations, giving them a comprehensive, accurate assessment of election day processes. With access to the voters list, many observer groups verify the accuracy of the list and highlight areas for improvement in the next elections.

Despite the increasing availability of election data, in most countries parties, the media and civil society do not yet have the capacity to take full advantage of the possibilities. The National Democratic Institute (NDI) is developing resources and tools to help equip electoral stakeholders, particularly citizen election observers, to use and analyze election data. We encourage more efforts like this so that the use of election data can reach its full potential.

For more on NDI’s Open Election Data Initiative, check out the website (available in English, Spanish and Arabic) and like us on Facebook.

Global Open Data Index 2015 – Taiwan Insight

Open Knowledge International - December 16, 2015 in Global Open Data Index

** This insight was written by TH Schee OK Taiwan ambassador **

Taiwan has surprisingly topped the Global Open Data Index 2015, and it’s not without questions as how this could be have been achieved without further examination. Even though Taiwan has been very active and recognised as one of the hotspot of open data, little is known on actual landscape outside the island. Take a look at tech president , Nieman Lab,  and Science & Technology Law Institute of Taiwan for more context.

To give some background to the seemingly odd result, context is needed to better understand how the Index has shaped Taiwan’s overall effort and awareness of it since 2013, and possibly even more so in the long run.

According to the “Freedom of the Press 2015”, Taiwan is considered among the top in Asia Pacific, along with Japan, Australia and New Zealand. It’s extremely vigorous, diverse and free environment of press freedom has served a facilitating catalyst to any communities, not just limited to the journalistic world, but also the public and private sectors which are part of the broader “reuse” groups of public sector information to engage in a way that enthusiasts in neighboring countries and economies can only shy away for safety reasons. To put it in simple terms, you are literally free and able to enjoy more freedom to interpret data, check the integrity of it, report it, or even use it to hold your government accountable in litigation.

The country staggeringly claims the world’s highest penetration of Facebook users to overall population. This has also contributed to a fast, and to some degrees even vicious, cycle of feedback loops on public discourse of any datasets released from dozens of data portals. This has greatly enhanced visibility of the agenda carried on by the #GODI15 on the island.

Taiwan in #GODI15

Taiwan in #GODI15

From the government perspective, another major contributing factor has been the establishment of the formalised mechanism on public consultation, in forms of dedicated committees in all ministries. A total of 30+ were established in first half of 2015, and seat rotation on a 1~2 year nominal terms is enacted, with majority of members from the government plus selected few from civil society, academia and private sectors. This has served very well to raise awareness of Open Knowledge and the #GODI15 inside the government, and serious actions were taken to study the #GODI15 in detail as early as 2013. This proves to be somewhat controversial in the final outcome, but we are seeing how the Index has formally affected the perception and assessment of its own mandates and initiatives in Taiwan. The discourse around #GODI15 is public in meeting minutes that are available through taking a look at

The third contributing factor is slightly uncomfortable because the government has supported some very disputable mandates, including possible release of personal data in form of open format from the National Healthcare Insurance Program without prior agreement from insurants. It has dearly caused major concerns from several human right groups and the civil society are still waiting for court verdict because a class action has been filed against the government. The case raised a whole new spectrum of understanding on issues that open data initiatives might bring a forth among transparency groups and the congress, and has created a much broader community base around provocative but valuable issues that we generally find it challenging to foster from top-down, technology-driven initiatives.

The upcoming Presidential election is set to take place in less than 40 days from now and it’s widely agreed that the agenda on open data and policies would be carried out in the new government. The best thing so far has never been the ranking, but a true dialogue among local and even regional stakeholders. The #GODI15 has only served a fresh start for Taiwan, and without it, sincere and reasoned debates would not even surface.

Global Open Data Index 2015 – Uruguay Insight

Open Knowledge International - December 9, 2015 in Global Open Data Index, Ideas and musings, open knowledge

This post was written by Daniel Carranza from DataUY

Uruguay has made headlines in the news lately. Mostly due to our unconventional former president José “Pepe” Mujica, and initiatives such as legalized abortion, regulated marijuana market and egalitarian marriage. It’s not the first time that our small country brings up innovative ideas ,as with divorce by mere will of the wife or the 8 hour workday at the beginning of the 20th century. But what most people don’t see behind the “maverick” headlines is the steady – but usually slow – processes that follow. Our country is not -and probably never was- in a rush. And Open Data doesn’t escape that contradictory logic that reigns everything over here; that tension between innovation and resistance to change.

This year’s seventh position in the Global Open Data Index tells only part of that story.

Open Data initiatives had a relatively early start in Uruguay from the government side, but amazingly demand actually came after that. Government policy and initiatives, such as it’s Open Data Portal, have been praised and recognized internationally, but we’re still working on a firm legal framework that supports all those initiatives for the long term. In civil society, we’ve been lucky enough to launch a couple of surprisingly successful projects (, Rampita, PreciosUY), but struggle with only a handful of organizations (grouped at the Red de Gobierno Abierto) actively involved in Open Data and Open Government. We need to “open“ the open data space (pun absolutely intended).

The challenge is then to keep moving forward. To rid ourselves from our conservative instincts and keep pushing until Open Data is the norm, not an effort, and requires less of all of the ecosystem’s energy. Reaching the Index’s top 10 should help us tip the scales just enough to keep things moving, but the risk of conformity is there, waiting for the slightest distraction.

The best news is that most of the actors involved in open data are working together on this, and it gives us the hope that the only way on is forward. Another trait of our country is closeness; you can share a seat on a bus with a Senator or call some big shot in government and actually get an answer. The same goes for middle managers -many of whom became our friends-, crucial players in delivering on the promise of Open Data. You see this among the relationships that build the Open Data ecosystem as well. This gives us incredible opportunities for dialogue, collaboration and most importantly co-creation. From a roundtable dialogue, to actually drafting legislation. Now it’s time to be calm, but bold. So we can keep advancing at our own pace, to our own mellow rhythm, but firmly moving forward.

Global Open Data Index 2015 – United Kingdom Insight

Open Knowledge International - December 9, 2015 in Global Open Data Index, open knowledge

This post was written by Owen Boswarva

For a third year running the United Kingdom has come out at or near the top of the Global Open Data Index. Unlike many of the countries that did well in previous years, the UK’s overall standing has not been greatly affected by the addition of five new categories. This demonstrates the broad scope of the UK’s open data programme. Practitioners within UK government who work to develop and release open datasets have much of which to be proud.

However the UK’s role as an open data leader also carries the risk of overconfidence. Policymakers can easily be tempted to rest on their laurels. If we look in more detail at this year’s submissions we can find plenty of learning points and areas for further development. There are also some signs the UK open data agenda may be losing momentum.

The biggest gap this year is in election results data, with the Electoral Commission dataset disqualified because it only reports down to constituency level. The criteria have changed from previous years, so this decision may seem a little harsh. But globally most electoral fraud takes place at the polling station. The UK is a mature democracy and should set an example by publishing more granular data.

There is a similar weakness in UK public data on water quality, which is available only at a high level in annual reports from regulators. Environmental data in general has been a mixed bag in 2015. Ordnance Survey, which maps most of the UK, published the first detailed open map of the river network; and the environment department Defra announced an ambition to release 8,000 open datasets. However there is a noticeable absence of open bulk data for historical weather observations and air pollution measurements.

UK progress on open data is also held back by the status of land ownership data. Ownership records and land boundaries are maintained by Land Registry and other government agencies. But despite (or perhaps because of) the considerable public interest in understanding how property wealth is distributed in the UK, this invaluable data is accessible only on commercial terms.

In other categories we can see deteriorations in the quality of UK open data.

National Archives is struggling to maintain its much-admired dataset. The latest version of Contracts Finder, an open search facility for public sector procurement contracts, no longer offers bulk downloads. Government digital strategy is turning steadily towards APIs and away from support for analytic re-use of public data.

Can the UK sustain its record of achievement in open data policy? Most of the central funding streams that supported open data release in recent years came to an end in 2015. A number of user engagement groups and key initiatives have either been wound up or left to drift. Urban and local open data hubs are thriving, but political devolution and lack of centralised collection are creating regional disparities in the availability of open data. Truly national datasets, those that help us understand the UK as a nation, are becoming harder to find.

UK open data policy may play well on the international stage, but at home there is still plenty of work for campaigners to do.

The Global Open Data Index 2015 is live – what is your country status?

Open Knowledge International - December 9, 2015 in Global Open Data Index, open knowledge

We are excited to announce that we have published the third annual Global Open Data Index. This year’s Index showed impressive gains from non-OECD countries with Taiwan topping the Index and Colombia and Uruguay breaking into the top ten at four and seven respectively. Overall, the Index evaluated 122 places and 1586 datasets and determined that only 9%, or 156 datasets, were both technically and legally open.

The Index ranks countries based on the availability and accessibility of data in thirteen key categories, including government spending, election results, procurement, and pollution levels. Over the summer, we held a public consultation, which saw contributions from individuals within the open data community as well as from key civil society organisations across an array of sectors. As a result of this consultation, we expanded the 2015 Index to include public procurement data, water quality data, land ownership data and weather data; we also decided to removed transport timetables due to the difficulties faced when comparing transport system data globally.

Open Knowledge International began to systematically track the release of open data by national governments in 2013 with the objective of measuring if governments were releasing the key datasets of high social and democratic value as open data. That enables us to better understand the current state of play and in turn work with civil society actors to address the gaps in data release. Over the course of the last three years, the Global Open Data Index has become more than just a benchmark – we noticed that governments began to use the Index as a reference to inform their open data priorities and civil society actors began to use the Index advocacy tool to encourage governments to improve their performance in releasing key datasets.

Furthermore, indices such as the Global Open Data Index are not without their challenges. The Index measures the technical and legal openness of datasets deemed to be of critical democratic and social value – it does not measure the openness of a given government. It should be clear that the release of a few key datasets is not a sufficient measure of the openness of a government. The blurring of lines between open data and open government is nothing new and has been hotly debated by civil society groups and transparency organisations since the sharp rise in popularity of open data policies over the last decade.

odi-600 While the goal of the Index has never been to measure the openness of governments, we have been working in collaborations with others to make the index more than just a benchmark of data release. This year, by collaborating with topical experts across an array of sectors, we were able to improve our dataset category definitions to ensure that we are measuring data that civil society groups require rather than simply the data that governments happen to be collecting.

Next year we will be doubling down on this effort to work in collaboration with topical experts to go beyond a “baseline” of reference datasets which are widely held to be important, to tracking the release of datasets deemed critical by the civil society groups working in a given field. This effort is both experimental and ambitious. Measuring open data is not trivial and we are keenly aware of the balance that needs to be struck between international comparability and local context and we will continue to work to get this balance right. Join us on the Index forum to join these future discussions.

Treasures from the Public Domain in New Essays Book

Adam Green - November 12, 2015 in Featured, Public Domain, Public Domain Review


Open Knowledge project The Public Domain Review is very proud to announce the launch of its second book of selected essays! For nearly five years now we’ve been diligently trawling the rich waters of the public domain, bringing to the surface all sorts of goodness from various openly licensed archives of historical material: from the Library of Congress to the Rijksmuseum, from Wikimedia Commons to the wonderful Internet Archive. We’ve also been showcasing, each fortnight, new writing on a selection of these public domain works, and this new book picks out our very best offerings from 2014.

All manner of oft-overlooked histories are explored in the book. We learn of the strange skeletal tableaux of Frederik Ruysch, pay a visit to Humphry Davy high on laughing gas, and peruse the pages of the first ever picture book for children (which includes the excellent table of Latin animal sounds pictured below). There’s also fireworks in art, petty pirates on trial, brainwashing machines, truth-revealing diseases, synesthetic auras, Byronic vampires, and Charles Darwin’s photograph collection of asylum patients. Together the fifteen illustrated essays chart a wonderfully curious course through the last five hundred years of history — from sea serpents of the 16th-century deep to early-20th-century Ouija literature — taking us on a journey through some of the darker, stranger, and altogether more intriguing corners of the past.

Order by 18th November to benefit from a special reduced price and delivery in time for Christmas

If you are wanting to get the book in time for Christmas (and we do think it’d make an excellent gift for that history-loving relative or friend!), then please make sure to order before midnight on Wednesday 18th November. Orders placed before this date will also benefit from a special reduced price!

Please visit the dedicated page on The Public Domain Review site to learn more and also buy the book!

Double page spread (full bleed!), showing a magnificent 18th-century print of a fireworks display at the Hague – from our essay on how artists have responded to the challenge of depicting fireworks through the ages.

Join the School of Data team: Technical Trainer wanted

Open Knowledge International - November 9, 2015 in Featured, Jobs, School of Data


The mission of Open Knowledge International is to open up all essential public interest information and see it utilized to create insight that drives change. To this end we work to create a global movement for open knowledge, supporting a network of leaders and local groups around the world; we facilitate coordination and knowledge sharing within the movement; we build collaboration with other change-making organisations both within our space and outside; and, finally, we prototype and provide a home for pioneering products.

A decade after its foundation, Open Knowledge International is ready for its next phase of development. We started as an organisation that led the quest for the opening up of existing data sets – and in today’s world most of the big data portals run on CKAN, an open source software product developed first by us.

Today, it is not only about opening up of data; it is making sure that this data is usable, useful and – most importantly – used, to improve people’s lives. Our current projects (School of Data, OpenSpending, OpenTrials, and many more) all aim towards giving people access to data, the knowledge to understand it, and the power to use it in our everyday lives.

The School of Data is growing in size and scope, and to support this project – alongside our partners – we are looking for an enthusiastic Technical Trainer (flexible location, part time).

School of Data is a network of data literacy practitioners, both organisations and individuals, implementing training and other data literacy activities in their respective countries and regions. Members of the School of Data work to empower civil society organizations (CSOs), journalists, governments and citizens with the skills they need to use data effectively in their efforts to create better, more equitable and more sustainable societies. Over the past four years, School of Data has succeeded in developing and sustaining a thriving and active network of data literacy practitioners in partnership with our implementing partners across Europe, Latin America, Asia and Africa.

Our local implementing partners are Social TIC, Code for Africa, Metamorphosis, and several Open Knowledge chapters around the world. Together, we have produced dozens of lessons and hands-on tutorials on how to work with data published online, benefitting thousands of people around the world. Over 4500 people have attended our tailored training events, and our network has mentored dozens of organisations to become tech savvy and data driven. Our methodologies and approach for delivering hands-on data training and data literacy skills – such as the data expedition – have now been replicated in various formats by organisations around the world.

One of our flagship initiatives, the School of Data Fellowship Programme, was first piloted in 2013 and has now successfully supported 26 fellows in 25 countries to provide long-term data support to CSOs in their communities. School of Data coordination team members are also consistently invited to give support locally to fellows in their projects and organisations that want to become more data-savvy.

In order to give fellows a solid point of reference in terms of content development and training resources, and also to have a point person to give capacity building support for our members and partners around the world, School of Data is now hiring an outstanding trainer/consultant who’s familiar with all the steps of the Data Pipeline and School of Data’s innovative training methodology to be the all-things-content-and-training for the School of Data network.


The hired professional will have three main objectives:

  • Technical Trainer & Data Wrangler: represent School of Data in training activities around the world, either supporting local members through our Training Dispatch or delivering the training themselves;
  • Data Pipeline & Training Consultant: give support for members and fellows regarding training (planning, agenda, content) and curriculum development using School of Data’s Data Pipeline;
  • Curriculum development: work closely with the Programme Manager & Coordination team to steer School of Data’s curriculum development, updating and refreshing our resources as novel techniques and tools arise.

Terms of Reference

  • Attend regular (weekly) planning calls with School of Data Coordination Team;
  • Work with current and future School of Data funders and partners in data-literacy related activities in an assortment of areas: Extractive Industries, Natural Disaster, Health, Transportation, Elections, etc;
  • Be available to organise and run in person data-literacy training events around the world, sometimes in short notice (agenda, content planning, identifying data sources, etc);
  • Provide reports of training events and support given to members and partners of School of Data Network;
  • Work closely with all School of Data Fellows around the world to aid them in their content development and training events planning & delivery;
  • Write for the School of Data blog about curriculum and training events;
  • Take ownership of the development of curriculum for School of Data and support training events of the School of Data network;
  • Work with Fellows and other School of Data Members to design and develop their skillshare curriculum;
  • Coordinate support for the Fellows when they do their trainings;
  • Mentor Fellows including monthly point person calls, providing feedback on blog posts and curriculum & general troubleshooting;
  • The position reports to School of Data’s Programme Manager and will work closely with other members of the project delivery team;
  • This part-time role is paid by the hour. You will be compensated with a market salary, in line with the parameters of a non-profit-organisation;
  • We offer employment contracts for residents of the UK with valid permits, and services contracts to overseas residents


  • A lightweight monthly report of performed activities with Fellows and members of the network;
  • A final narrative report at the end of the first period (6 months) summarising performed activities;
  • Map the current School of Data curriculum to diagnose potential areas of improvement and to update;
  • Plan and suggest a curriculum development & training delivery toolkit for Fellows and members of the network


  • Be self-motivated and autonomous;
  • Fluency in written and spoken English (Spanish & French are a plus);
  • Reliable internet connection;
  • Outstanding presentation and communication skills;
  • Proven experience running and planning training events;
  • Proven experience developing curriculum around data-related topics;
  • Experience working remotely with workmates in multiple timezones is a plus;
  • Experience in project management;
  • Major in Journalism, Computer Science, or related field is a plus

We strive for diversity in our team and encourage applicants from the Global South and from minorities.


Six months to one year: from November 2015 (as soon as possible) to April 2016, with the possibility to extend until October 2016 and beyond, at 10-12 days per month (8 hours/day).

Application Process

Interested? Then send us a motivational letter and a one page CV via

Please indicate your current country of residence, as well as your salary expectations (in GBP) and your earliest availability.

Early application is encouraged, as we are looking to fill the positions as soon as possible. These vacancies will close when we find a suitable candidate.

Interviews will be conducted on a rolling basis and may be requested on short notice.

If you have any questions, please direct them to jobs [at]

Your input needed: final review of 2015 Global Open Data Index

Mor Rubinstein - October 28, 2015 in Global Open Data Index, open knowledge

We’re now in the final stretch for the 2015 Global Open Data Index, and will be publishing the results in the very near future! As a community driven measurement tool, this year we have incorporated feedback we’ve received over the past several years to make the Index more useful as an instrument for civil society — particularly around what data should be measured and what attributes are important for each dataset.

As a crowdsourced survey, we have taken extra steps to ensure the measurement instrument is more reliable. We are are aware that there is no perfect measurement that can be applied globally, but we aim to be as accurate as we possibly can. We have documented our processes year to year and inevitably not everything has been perfect, but by engaging in this process of experimentation, trial and error we hope the Global Open Data Index will continue to evolve as an innovative, grassroots, global tool for civil society to measure the state of open data.

The journey this year was long, but productive. Here is a recap of the steps we have taken in the long road to publishing the 2015 Index:

  1. Global consultation on new datasets — We sought your opinions and ideas for new themes that are important for civil society which should be added to the Index. As a result of this initiative, we have added 4 new datasets to this year’s Index, including: Government procurement tenders, Water Quality, Land Ownership and Weather forecast.
  2. Consultation on methodology —The Index team refined the definitions of the datasets based on feedback from open data advocates, researchers and communities from around the world. We have tightened the definitions of the datasets to allow for greater accuracy and comparability.
  3. Submissions phase – The crowdsourced phase where submissions are made to the Index with the help of the great index community and our new local index coordinators.
  4. Quality Assurance of the data — We added a preliminary stage of QA this year to conduct a systematic review of the license and machine readable questions — the two attributes that have given past submitters the most trouble.
  5. Thematic review with experts — This year, instead of conducting reviews of complete submissions by country or regional reviewers, we deployed expert thematic reviewers. Thematic reviewers assessed the submissions of all entries for a given dataset, and made sure that we are comparing the right datasets to one another between all 120 places included in this year’s Index, and that they were compliant with the new definition we made for each dataset .

Now, we are in the final phase of assessing the submissions for this year’s Index. After conducting a lengthy review phase, we seek your help to understand if we have evaluated the submissions correctly before finalizing the Index and publishing this year’s scores. In the next two weeks, from today until November 6, we will open the Index again to your comments. We encourage everyone to comment on the Index, civil society and governments alike.

Before you comment on a submission, note that we allowed thematic reviewers to apply their own logic to their review based on their expertise and assessment of the entire body of submissions across all places. This logic was grounded in the published definitions for each dataset, but allowed for some subjective flexibility in order to maintain a consistent review and account for the challenges faced by submitters, particularly in the cases of the datasets that were added this year and those with substantial changes to their definitions. Please read this section carefully before commenting on submissions. Note two things:

After careful consideration, we’ve omitted two datasets from the final scoring of the 2015 Index — public transport and health performance. We omitted public transport because 45 countries do not have a national level public transport system, which accounts for 37% of the Index sample. This does not allow an equal comparison between places. We omitted health performance data because we asked for two different datasets, and could record only one dataset faithfully in the Index system and as such it was almost impossible to score any of these entries as a unified submission. In both cases we will review the data and make it available for further investigation, and will see how we can make adjustments and incorporate these important datasets into future indexes. In some places, our reviewers could not complete their evaluation and needed more information. We would appreciate if you can help provide more information on any of these submissions. Any entry that displays a number ‘1’ in an orange circle on it needs further attention.

Here is a summary of the reviewers approaches to evaluating submissions for each dataset included in the 2015 Index:

Government Budget

Reviewer: Mor Rubinstein

The stated description of the Government Budget dataset is as follows:

National government budget at a high level. This category is looking at budgets, or the planned government expenditure for the upcoming year, and not the actual expenditure. To satisfy this category, the following minimum criteria must be met: Planned budget divided by government department and sub-department Updated once a year. The budget should include descriptions regarding the different budget sections.

Submissions that included data for both department AND sub-department/program were accepted. Submissions that included only department level data were not accepted. Additionally, budget speeches that did not include detailed data about the the estimated expenditures for the coming year were not accepted as a submission. Only datasets from an official source (e.g. The Ministry of Finance or equivalent agency) were accepted.

Government Spending

Reviewer: Tryggvi Björgvinsson

The stated description of the Government Spending dataset is as follows:

Records of actual (past) national government spending at a detailed transactional level; A database of contracts awarded or similar will not considered sufficient. This data category refers to detailed ongoing data on actual expenditure. Data submitted in this category should meet the following minimum criteria: Individual record of transactions. Date of the transactions Government office which had the transaction Name of vendor amount of the transaction Update on a monthly basis

Submissions that included aggregate data or simply procurement contracts (results of calls for tenders) were not accepted. In cases where aggregate data or procurement data was submitted or the submitter claimed that the data did not exist, an attempt was made to locate transactional data with a simple Google search and/or via IBP’s Open Budget Survey. If data was available for the previous year (or applicable recent budget cycle) the submission was adjusted accordingly and accepted.

Election Results

Reviewer: Kamil Gregor

The stated description of the Election Results dataset is as follows:

This data category requires results by constituency / district for all major national electoral contests. To satisfy this category, the following minimum criteria must be met: Result for all major electoral contests Number of registered votes Number of invalid votes Number of spoiled ballots All data should be reported at the level of the polling station

Submissions that did not show the data at polling station level were omitted and marked as ‘Data does not exist’, even if votes are not counted at polling station level as a matter of policy. The reason for this is the polling station level is the most granular level that allow to monitor election fraud .

Company Register

Reviewer: Rebecca Sentance

The stated description of the Company Register dataset is as follows:

List of registered (limited liability) companies. The submissions in this data category does not need to include detailed financial data such as balance sheet etc. To satisfy this category, the following minimum criteria must be met: Name of company Unique identifier of the company Company address Updated at least once a month

Data was marked as unsure if it exists when the submitted dataset did not contain address or a company ID. If the submission referenced a relevant government website that does not indicate the data exists, or if there is no evidence even which government body would hold the data, the submission was changed to ‘data does not exist’.. If it is clear that a governmental body collects company data, but there is no way of knowing what it consists of, where it is held, or how to access it, and no indication that it would fulfil our requirements, the submission was also marked as ‘data does not exist’.

Based on the definition, it was decided that a company register that is freely available to searchable by the public but requires entering a search term (search applications) did not count as free or publicly accessible. However, a company register that can be browsed through page-by-page does present all of the data and is the type of dataset required for acceptance.

National Statistics

Reviewer: Zach Christensen

The stated description of the National Statistics dataset is as follows:

Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc). To satisfy this category, the following minimum criteria must be met: GDP for the whole country updated at least quarterly Unemployment statistics updated at least monthly Population updated at least once a year

For each submission, the reviewer checked for national accounts, unemployment, and population data as required by the description. It was found that most countries don’t have these data for the last year and very few had quarterly GDP figures or monthly unemployment figures. Submissions were only marked as ‘data does not exist’ if they did not have any national statistics more recent than 2010.


Reviewer: Kamil Gregor

The stated description of the Legislation dataset is as follows:

This data category requires all national laws and statutes available to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available. To satisfy this category, the following minimum criteria must be met: Content of the law / status If applicable, all relevant amendments to the law Date of last amendments Data should be updated at least on quarterly

Submissions were reviewed to ensure the data met the criteria. Regularity of updating was assessed based on the date of the most recently submitted data.

Pollutant Emissions

Reviewer: Yaron Michl

The stated description of the Pollutant Emissions dataset is as follows:

Aggregate data about the emission of air pollutants especially those potentially harmful to human health (although it is not a requirement to include information on greenhouse gas emissions). Aggregate means national-level or available for at least three major cities. In order to satisfy the minimum requirements for this category, data must be available for the following pollutants and meet the following minimum criteria: Particulate matter (PM) Levels Sulphur oxides (SOx) Nitrogen oxides (NOx) Volatile organic compounds (VOCs) Carbon monoxide (CO) Updated on at least once a week. Measured either at a national level by regions or at leasts in 3 big cities.

VOCs is a generic designation for many organic chemicals, therefore, when measuring VOCs it is possible to measure any one of a number of compounds such as Benzene or MTBE. Measurements of Volatile Organic compounds(VOCs) was ultimately not considered as part of the data requirements because of this discrepancy and the fact that it is rarely measured on a national level (see this link).

Carbon monoxide (CO) and Nitrogen Oxides (NoX) were also not considered as a requirement because their main origin is usually from transportation.

In addition, some countries publish air pollution by using the Air Quality Index, a formula that translates air quality data into numbers and colors to help citizens understand when to take action to protect their health. Submissions that relied on the Air Quality Index was considered not to exist because it is not raw data.

Government Procurement Tenders

Reviewer: Georg Neumann

The stated description of the Government Procurement Tenders dataset is as follows:

All tenders and awards of the national/federal government aggregated by office. Monitoring tenders can help new groups to participate in tenders and increase government compliance. Data submitted in this category must be aggregated by office, updated at least monthly & satisfy the following minimum criteria: Tenders: tenders name tender description tender status Awards: Award title Award description value of the award suppliers name

Quality of published information varied strongly and was not evaluated here. As long as the minimum information was available the data was said to exist for a given place.

Thresholds for publication of this information varies strongly by country. For all EU countries, tenders above a specific amount, detailed here, need to be published. This allowed for all EU submissions to qualify as publishing open procurement data even though some countries, such as Germany, do not publish award value for contracts below those thresholds, and others have closed systems to access specific information on contracts awarded.

In other countries not all sectors of government publish tenders and awards data. Submissions were evaluated to ensure that the main government tenders and contracts were made public, notwithstanding that data from certain ministries may have been missing.

Water Quality

Reviewer: Nisha Thompson

The stated description of the Water Quality dataset is as follows:

Data, measured at the water source, on the quality of water is essential for both the delivery of services and the prevention of diseases. In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals by water source and be updated at least weekly: fecal coliform arsenic fluoride levels nitrates TDS (Total dissolved solids)

If a country treats water or distributes it, then there will be data regarding water quality because all water treatment requires quality checks. Even though water quality is a local responsibility in most countries, very few countries have a completely decentralized system. Usually there is a monitoring role by the central government, either by the Environmental Protection Agency, Ministry of the Environment or the Ministry of Public Health. If there is monitoring role, the data does exist, if monitoring is completely decentralized, like in the UK, the submission was marked as ‘does not exist’ because there is no aggregation of the data. If data was not available daily or weekly it wasn’t considered timely.

In some cases, all the parameters were accounted for except TDS. Even though it is standard, some countries only collect conductivity, which can be used to calculate TDS. In this case, the submission was approved as is.

Land Ownership

Reviewer: Codrina Maria Ilie

The stated description of the Land Ownership dataset is as follows: Cadaster showing land ownership data on a map and include all metadata on the land. Cadaster data submitted in this category must include the following characteristics: Land borders Land owners name Land size national level updated yearly For various reasons, the land owner’s name attribute was widely unmet and as such, lack of this data was not considered a factor in evaluating these submissions. As this dataset is subject to well-kept historic records (not always the case), to legislation (which can be fluctuant), to very expensive activities that a government must implement in order to keep data up to date, to the complexity of the data itself (sometimes data that makes a national cadastre is registred in different registries or systems), a first year indexing exercise must not be considered exhaustive.


Reviewers: Neal Bastek & Stephen Gates

The stated description of the Weather dataset is as follows:

5 days forecast of temperature, precipitation and wind as well as recorded data for temperature, wind and precipitation for the past year. In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria: 5 days forecast of temperature updated daily 5 days forecast of wind updated daily 5 days forecast of precipitation updated daily Historical temperature data for the past year

Based on a general assessment of the submissions, a minimum threshold for claiming the data existed was set at forecast data for today + two days (three days) with a qualitative allowance made for arid regions substituting humidity data for precipitation data. The threshold for inclusion could also be met with four day forecasts that include temperature and precipitation data, and/or a generic statement using text or descriptive icons about conditions (e.g. windy, stormy, partly cloudy, sunny, fair, etc.).


Reviewer: Codrina Maria Ilie

The stated description of the Location dataset is as follows:

A database of postcodes/zipcodes and the corresponding spatial locations in terms of a latitude and a longitude (or similar coordinates in an openly published national coordinate system). If a postcode/zipcode system does not exist in the country, please submit a dataset of administrative borders. Data submitted in this category must satisfy the following minimum conditions Zipcodes Address Coordinate (latitude longitude) national level updated once a year Administrative boundaries Boarders poligone name of poligone (city, neighborhood) national level updated once a year

In cases in which a country has not adopted a postcode system, the location dataset is considered to be administrative boundaries. The Universal Postal Union – Postal Addressing System was used to identify the structure of a postcode for a given place []. This tool proved significantly useful when identifying countries that do not use a postcode system.

In situations where countries only had a postcode search service, either by postcode or address, data was said to not exist. If the postcodes were not geocoded, submissions did not meet the Index requirements due to the difficulty of geocoding such a dataset. On the other hand, if the postcode system took into account just the smallest administrative boundary and if that boundary was officially available, considering the easiness of obtaining the geocoded postcodes number, the data was marked as ‘does exist’ for that submission.

National Map Reviewer: Gil Zaretzer

The stated description of the National Map dataset is as follows:

This data category requires a high level national map. To satisfy this category, the following minimum criteria must be met: Scale of 1:250,000 (1 cm = 2.5km). Markings of national roads National borders Marking of streams, rivers, lakes, mountains. Updated at least once a year.

Only submissions from an official source, with original data was considered. A link to Google Maps, which was often provided, does not satisfy the criteria for these submissions.

In cases where there was no link provided in the submission and entries were marked as “unsure” if there was any indication that the data exists but was not available online, i.e. a national mapping service without a website.

Get Updates