Support Us

You are browsing the archive for Open Data.

What Do We Mean By Small Data

April 26, 2013 in Featured, Ideas and musings, Labs, Open Data, Small Data

Earlier this week we published the first in a series of posts on small data: “Forget Big Data, Small Data is the Real Revolution”. In this second in the series, we discuss small data in more detail providing a rough definition and drawing parallels with the history of computers and software.

What do we mean by “small data”? Let’s define it crudely as:

“Small data is the amount of data you can conveniently store and process on a single machine, and in particular, a high-end laptop or server”

Why a laptop? What’s interesting (and new) right now is the democratisation of data and the associated possibility of large-scale distributed community of data wranglers working collaboratively. What matters here then is, crudely, the amount of data that an average data geek can handle on their own machine, their own laptop.

A key point is that the dramatic advances in computing, storage and bandwidth have far bigger implications for “small data” than for “big data”. The recent advances have increased the realm of small data, the kind of data that an individual can handle on their own hardware, far more relatively than they have increased the realm of “big data”. Suddenly working with significant datasets – datasets containing tens of thousands, hundreds of thousands or millions of rows can be a mass-participation activity.

(As should be clear from the above definition – and any recent history of computing – small (and big) are relative terms that change as technology advances – for example, in 1994 a terabyte of storage cost several hundred thousand dollars, today its under a hundred. This also means today’s big is tomorrow’s small).

Our situation today is similar to microcomputers in the late 70s and early 80s or the Internet in the 90s. When microcomputers first arrived, they seemed puny in comparison to the “big” computing and “big” software then around and there was nothing strictly they could do that existing computing could not. However, they were revolutionary in one fundamental way: they made computing a mass-participation activity. Similarly, the Internet was not new in the 1990s – it had been around in various forms for several decades – but it was at that point it became available at a mass-scale to the average developer (and ultimately citizen). In both cases “big” kept on advancing too – be it supercomputers or the high-end connectivity – but the revolution came from “small”.

This (small) data revolution is just beginning. The tools and infrastructure to enable effective collaboration and rapid scaling for small data are in their infancy, and the communities with the capacities and skills to use small data are in their early stages. Want to get involved in the small data forward revolution — sign up now

This is the second in a series of posts about the power of Small Data – follow the Open Knowledge Foundation blog, Twitter or Facebook to learn more and join the debate at #SmallData on Twitter.

Building the foundation for an Open Data Directory

April 24, 2013 in External, Open Data

Open (Government) Data as it is understood nowadays can still be considered a new concept. It started to gain traction worldwide since the Obama memo in early 2009 and the launch of data.gov a few months later. Following successful leading examples of the US and UK governments we have seen Open Data flourishing all over the world over the last three years. About three hundred Open Data catalogues have been identified so far.

But still, it’s not always clear how to deliver good solutions and many questions remain unanswered. In order to build sustainable Open Data initiatives in a varied range of countries a broader view to address challenges is needed. New and existing initiatives will benefit from shared knowledge and will also produce a range of resources that should be published in a freely and open way for others to reuse.

As the Open Data movement is growing worldwide; the number of available resources is also increasing. The scarcity of only 3-4 years ago is ending but the resources are appearing in disparate places and formats, sometimes difficult to find and share. There is a pressing need to compile and document existing resources that are verified, trustworthy, comparable, and searchable.

The Open Data Directory

Upon discussions with many in the Open Data community, an initial analysis of their own project needs and preliminary research on existing public resources, the Web Foundation believes that the community at large would benefit from a central entry point to Open Data related resources at a neutral source, the Open Data Directory (ODD).

This ODD will help to produce clear evidence base of the benefits of Open Data holding a wide range of resources types such as: use cases, case studies, stories and anecdotes, methodologies, strategies, business cases, papers, reports, articles, blog posts, training materials, slide sets, software tools, applications and visualisations. The directory will not focus on compiling a vast number of references; instead it will give priority to high-quality references endorsed by the Open Data community.

As a first step towards the ODD, we are making public the Use Cases and Requirements Draft in order to get comments from the wide community, not only on the content of the document itself but also on the overall idea of the ODD. We’ve published it as a Google Document with comments turned on. This is a tool for you, the Open Data community, so suggestions, feedback and comments are very welcome. Deadline for submitting comments is: April 29th, 2013.

Frictionless Data: making it radically easier to get stuff done with data

April 24, 2013 in Featured, Ideas and musings, Labs, Open Data, Open Standards, Small Data, Technical

Frictionless Data is now in alpha at http://data.okfn.org/ – and we’d like you to get involved.

Our mission is to make it radically easier to make data used and useful – our immediate goal is make it as simple as possible to get the data you want into the tool of your choice.

This isn’t about building a big datastore or a data management system – it’s simply saving people from repeating all the same tasks of discovering a dataset, getting it into a format they can use, cleaning it up – all before they can do anything useful with it! If you’ve ever spent the first half of a hackday just tidying up tabular data and getting it ready to use, Frictionless Data is for you.

Our work is based on a few key principles:

  • Narrow focus — improve one small part of the data chain, standards and tools are limited in scope and size
  • Build for the web – use formats that are web “native” (JSON) and work naturally with HTTP (plain-text, CSV is streamable etc)
  • Distributed not centralised — designed for a distributed ecosystem (no centralized, single point of failure or dependence)
  • Work with existing tools — don’t expect people to come to you, make this work with their tools and their workflows (almost everyone in the world can open a CSV file, every language can handle CSV and JSON)
  • Simplicity (but sufficiency) — use the simplest formats possible and do the minimum in terms of metadata but be sufficient in terms of schemas and structure for tools to be effective

We believe that making it easy to get and use data and especially open data is central to creating a more connected digital data ecosystem and accelerating the creation of social and commercial value. This project is about reducing friction in getting, using and connecting data, making it radically easier to get data you need into the tool of your choice. Frictionless Data distills much of our learning over the last 7 years into some specific standards and infrastructure.

What’s the Problem?

Today, when you decide to cook, the ingredients are readily available at local supermarkets or even already in your kitchen. You don’t need to travel to a farm, collect eggs, mill the corn, cure the bacon etc – as you once would have done! Instead, thanks to standard systems of measurement, packaging, shipping (e.g. containerization) and payment, ingredients can get from the farm direct to my local shop or even my door.

But with data we’re still largely stuck at this early stage: every time you want to do an analysis or build an app you have to set off around the internet to dig up data, extract it, clean it and prepare it before you can even get it into your tool and begin your work proper.

What do we need to do for the working with data to be like cooking today – where you get to spend your time making the cake (creating insights) not preparing and collecting the ingredients (digging up and cleaning data)?

The answer: radical improvements in the “logistics” of data associated with specialisation and standardisation. In analogy with food we need standard systems of “measurement”, packaging, and transport so that its easy to get data from its original source into the application where you can start working with it.

Frictionless DAta idea

What’s Frictionless Data going to do?

We start with an advantage: unlike for physical goods transporting digital information from one computer to another is very cheap! This means the focus can be on standardizing and simplifying the process of getting data from one application to another (or one form to another). We propose work in 3 related areas:

  • Key simple standards. For example, a standardized “packaging” of data that makes it easy to transport and use (think of the “containerization” revolution in shipping)
  • Simple tooling and integration – you should be able to get data in these standard formats into or out of Excel, R, Hadoop or whatever tool you use
  • Bootstrapping the system with essential data – we need to get the ball rolling

frictionless data components diagram

What’s Frictionless Data today?

1. Data

We have some exemplar datasets which are useful for a lot of people – these are:

  • High Quality & Reliable

    • We have sourced, normalized and quality checked a set of key reference datasets such as country codes, currencies, GDP and population.
  • Standard Form & Bulk Access

    • All the datasets are provided in a standardized form and can be accessed in bulk as CSV together with a simple JSON schema.
  • Versioned & Packaged

    • All data is in data packages and is versioned using git so all changes are visible and data can becollaboratively maintained.

2. Standards

We have two simple data package formats, described as ultra-lightweight, RFC-style specifications. They build heavily on prior work. Simplicity and practicality were guiding design criteria.

Frictionless Data: package standard diagram

Data package: minimal wrapping, agnostic about the data its “packaging”, designed for extension. This flexibility is good as it can be used as a transport for pretty much any kind of data but it also limits integration and tooling. Read the full Data Package specification.

Simple data format (SDF): focuses on tabular data only and extends data package (data in simple data format is a data package) by requiring data to be “good” CSVs and the provision of a simple JSON-based schema to describe them (“JSON Table Schema”). Read the full Simple Data Format specification.

3. Tools

It’s early days for Frictionless Data, so we’re still working on this bit! But there’s a need for validators, schema generators, and all kinds of integration. You can help out – see below for details or check out the issues on github.

Doesn’t this already exist?

People have been working on data for a while – doesn’t something like this already exist? The crude answer is yes and no. People, including folks here at the Open Knowledge Foundation, have been working on this for quite some time, and there are already some parts of the solution out there. Furthermore, much of these ideas are directly borrowed from similar work in software. For example, the Data Packages spec (first version in 2007!) builds heavily on packaging projects and specifications like Debian and CommonJS.

Key distinguishing features of Frictionless Data:

  • Ultra-simplicity – we want to keep things as simple as they possibly can be. This includes formats (JSON and CSV) and a focus on end-user tool integration, so people can just get the data they want into the tool they want and move on to the real task
  • Web orientation – we want an approach that fits naturally with the web
  • Focus on integration with existing tools
  • Distributed and not tied to a given tool or project – this is not about creating a central data marketplace or similar setup. It’s about creating a basic framework that would enable anyone to publish and use datasets more easily and without going through a central broker.

Many of these are shared with (and derive from) other approaches but as a whole we believe this provides an especially powerful setup.

Get Involved

This is a community-run project coordinated by the Open Knowledge Foundation as part of Open Knowledge Foundation Labs. Please get involved:

  • Spread the word! Frictionless Data is a key part of the real data revolution – follow the debate on #SmallData and share our posts so more people can get involved

Opening up the wisdom of crowds for science

April 22, 2013 in Featured, News, Open Data, Open Science, Our Work, PyBossa, Releases

We are excited to announce the official launch of Crowdcrafting.org, an open source software platform – powered by our Pybossa technology – for developing and sharing projects that rely on the help of thousands of online volunteers.

crowdcrafting logo

At a workshop on Citizen Cyberscience held this week at University of Geneva, a novel open source software platform called Crowdcrafting was officially launched. This platform, which already has attracted thousands of participants during several months of testing, enables the rapid development of online citizen science applications, by both amateur and professional scientists.

Applications already running on Crowdcrafting range from classifying images of magnetic molecules to analyzing tweets about natural disasters. During the testing phase, some 50 new applications have been created, with over 50 more under development. The Crowdcrafting platform is hosted by University of Geneva, and is a joint initiative between the Open Knowledge Foundation and the Citizen Cyberscience Centre, a Geneva-based partnership co-founded by University of Geneva. The Sloan Foundation has recently awarded a grant to this joint initiative for the further development of the Crowdcrafting platform.

Crowdcrafting fills a valuable niche in the broad spectrum of online citizen science. There are already many citizen science projects that use online volunteers to achieve breakthrough results, in fields as diverse as proteomics and astronomy. These projects often involve hundreds of thousands of dedicated volunteers over many years. The objective of Crowdcrafting is to make it quick and easy for professional scientists as well as amateurs to design and launch their own online citizen science projects. This enables even relatively small projects to get started, which may require the effort of just a hundred volunteers for only a few weeks. Such initiatives may be small on the scale of most online social networks, but they still correspond to many man-years of scientific effort achieved in a short time and at low cost.

“By emphasizing openness and simplicity, Crowdcrafting is lowering the threshold in investment and expertise needed to develop online citizen science projects”, says Guillemette Bolens, Deputy Rector for Research at the University of Geneva. “As a result, dozens of projects are under development, many of them in the digital humanities and data journalism, some of them created by university students, others still by people outside of academia.”

An example occurred after the tropical storm that wreaked havoc in the Philippines late last year. A volunteer initiative called Digital Humanitarian Network used Crowdcrafting to launch a project called Philippines Typhoon. This enabled online volunteers to classify thousands of tweets about the impact of the storm, in order to more rapidly filter information that could be vital to first responders. “We are excited about how Crowdcrafting is assisting the digital volunteer community worldwide in responding to natural disasters,” says Francesco Pisano, Director of Research at UNITAR.

“Crowdcrafting is also enabling the general public to contribute in a direct way to fundamental science,” says Gabriel Aeppli, Director of the London Centre for Nanotechnology (LCN), a joint venture between UCL and Imperial College. A case in point is the project Feynman’s Flowers, set up by researchers at LCN. In this project, volunteers use Crowdcrafting to measure the orientation of magnetic molecules on a crystalline surface. This is part of a fundamental research effort aimed at creating novel nanoscale storage systems for the emerging field of quantum computing.

Commenting on the underlying technology, Rufus Pollock, founder of the Open Knowledge Foundation, said, “Crowdcrafting is powered by the open-source PyBossa software, developed by ourselves in collaboration with the Citizen Cyberscience Centre. Its aim is to make it quick and easy to do “crowdsourcing for good” – getting volunteers to help out with tasks such as image classification, transcription and geocoding in relation to scientific and humanitarian projects”. The Shuttleworth Foundation and the Open Society Foundations funded much of the early development work for this technology.

Francois Grey, coordinator of the Citizen Cyberscience Centre, says, “Our goal now, with support from the Sloan Foundation, is to integrate other apps for data collection, processing and storage, to make Crowdcrafting an open-source ecosystem for building a new generation of browser-based citizen science projects.”

For further information about Crowdcrafting, see Crowdcrafting.org.

Forget Big Data, Small Data is the Real Revolution

April 22, 2013 in Featured, Ideas and musings, Labs, Open Data, Small Data

There is a lot of talk about “big data” at the moment. For example, this is Big Data Week, which will see events about big data in dozens of cities around the world. But the discussions around big data miss a much bigger and more important picture: the real opportunity is not big data, but small data. Not centralized “big iron”, but decentralized data wrangling. Not “one ring to rule them all” but “small pieces loosely joined”.

Big data smacks of the centralization fads we’ve seen in each computing era. The thought that ‘hey there’s more data than we can process!’ (something which is no doubt always true year-on-year since computing began) is dressed up as the latest trend with associated technology must-haves.

Meanwhile we risk overlooking the much more important story here, the real revolution, which is the mass democratisation of the means of access, storage and processing of data. This story isn’t about large organisations running parallel software on tens of thousand of servers, but about more people than ever being able to collaborate effectively around a distributed ecosystem of information, an ecosystem of small data.

Just as we now find it ludicrous to talk of “big software” – as if size in itself were a measure of value – we should, and will one day, find it equally odd to talk of “big data”. Size in itself doesn’t matter – what matters is having the data, of whatever size, that helps us solve a problem or address the question we have.

For many problems and questions, small data in itself is enough. The data on my household energy use, the times of local buses, government spending – these are all small data. Everything processed in Excel is small data. When Hans Rosling shows us how to understand our world through population change or literacy he’s doing it with small data.

And when we want to scale up the way to do that is through componentized small data: by creating and integrating small data “packages” not building big data monoliths, by partitioning problems in a way that works across people and organizations, not through creating massive centralized silos.

This next decade belongs to distributed models not centralized ones, to collaboration not control, and to small data not big data.

Want to create the real data revolution? Come join our community creating the tools and materials to make it happen — sign up here:

This is the first in a series of posts about the power of Small Data – follow the Open Knowledge Foundation blog, Twitter or Facebook to learn more and join the debate at #SmallData on Twitter.

Reinhart-Rogoff Revisited: Why we need open data in economics

April 22, 2013 in Open Data, Open Economics, WG Economics

 

This blog post is cross-posted from the Open Economics Blog.

Another economics scandal made the news last week. Harvard Kennedy School professor Carmen Reinhart and Harvard University professor Kenneth Rogoff argued in their 2010 NBER paper that economic growth slows down when the debt/GDP ratio exceeds the threshold of 90 percent of GDP. These results were also published in one of the most prestigious economics journals – the American Economic Review (AER) – and had a powerful resonance in a period of serious economic and public policy turmoil when governments around the world slashed spending in order to decrease the public deficit and stimulate economic growth.

Carmen Reinhart

Kenneth Rogoff

Yet, they were proven wrong. Thomas Herndon, Michael Ash and Robert Pollin from the University of Massachusetts (UMass) tried to replicate the results of Reinhart and Rogoff and criticised them on the basis of three reasons:

  • Coding errors: due to a spreadsheet error five countries were excluded completely from the sample resulting in significant error of the average real GDP growth and the debt/GDP ratio in several categories
  • Selective exclusion of available data and data gaps: Reinhart and Rogoff exclude Australia (1946-1950), New Zealand (1946-1949) and Canada (1946-1950). This exclusion is alone responsible for a significant reduction of the estimated real GDP growth in the highest public debt/GDP category
  • Unconventional weighting of summary statistics: the authors do not discuss their decision to weight equally by country rather than by country-year, which could be arbitrary and ignores the issue of serial correlation.

The implications of these results are that countries with high levels of public debt experience only “modestly diminished” average GDP growth rates and as the UMass authors show there is a wide range of GDP growth performances at every level of public debt among the twenty advanced economies in the survey of Reinhart and Rogoff. Even if the negative trend is still visible in the results of the UMass researchers, the data fits the trend very poorly: “low debt and poor growth, and high debt and strong growth, are both reasonably common outcomes.”

Source: Herndon, T., Ash, M. & Pollin, R., “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff, Public Economy Research Institute at University of Massachusetts: Amherst Working Paper Series. April 2013.

What makes it even more compelling news is that it is all a tale from the state of Massachusetts: distinguished Harvard professors (#1 university in the US) challenged by empiricists from the less known UMAss (#97 university in the US). Then despite the excellent AER data availability policy – which acts as a role model for other journals in economics – the AER has failed to enforce it and make the data and code of Reinhart and Rogoff available to other researchers.

Coding errors happen, yet the greater research misconduct was not allowing other researchers to review and replicate the results through making the data openly available. If the data and code were made available upon publication in 2010, it may not have taken three years to prove these results wrong, which may have influenced the direction of public policy around the world towards stricter austerity measures. Sharing research data means a possibility to replicate and discuss, enabling the scrutiny of research findings as well as improvement and validation of research methods through more scientific enquiry and debate.

Get in Touch

The Open Economics Working Group advocates the release of datasets and code, along with published academic articles, and provides practical assistance to researchers who would like to do so. Get in touch if you would like to learn more by writing us at economics [at] okfn.org and signing up to our mailing list.

References

Herndon, T., Ash, M. & Pollin, R., “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff, Public Economy Research Institute at University of Massachusetts: Amherst Working Paper Series. April 2013: Link to paper | Link to data and code

The new PSI Directive – as good as it seems?

April 19, 2013 in Access to Information, External, Open Data

A closer look at the new PSI Directive by Ton Zijlstra and Katleen Janssen EPP

image by European People’s Party CC-BY-2.0, via Wikimedia Commons

On 10 April, the European Commission’s Vice-President Neelie Kroes, responsible for the Digital Agenda for Europe, announced that the European Union (EU) Member States have approved a text for the new PSI Directive. The PSI Directive governs the re-use of public sector information, otherwise known as as Open Government Data.

In this posting we take a closer look at the progress the EC press release claims, and make a comparison with the current PSI Directive. We base this comparison on the text (not officially published) of the output of the final trialogue of 25 March and apparently accepted by the Member States last week.

The final step now, after this acceptance by Member States, is the adoption of the same text by the European Parliament, who have been part of the trialogue and thus are likely to be in agreement. The vote in the ITRE Committee is planned on 25 April, and the plenary Parliament vote on 11 June. Member States will then have 24 months to transpose the new directive into national law, which means it should be in force towards the end of 2015 across the EU.

The Open Data yardstick

The existing PSI Directive was adopted in 2003, well before the emergence of the Open Data movement, and written with mostly ‘traditional’ and existing re-users of government information in mind. Within the wider Open Data community this new PSI Directive will largely be judged by a) how well it moves towards embracing Open Data as the norm, in the sense of the Open Definition, and b) to what extent it makes this mandatory for EU Member States.

This means that scope and access rights, and redress options where those rights are denied, charging and licensing practices as well as standards and formats are of interest here. We will go through these points of interest point by point:

Access rights and scope

  • The new PSI Directive brings museums, libraries and archives within its scope; however a range of exceptions and less strict rules apply to these data holders;
  • The Directive builds, as before, on existing national legislation concerning freedom of information and privacy and data protection. This means it only looks at re-use in the context of what is already legally public, and it does not make pro-active publishing mandatory in any way;
  • The general principle for re-use has been revised. Where the old directive describes cases where re-use has been allowed (making it dependent on that approval and thus leaving the choice to the Member States or the public bodies), the new directive says all documents within scope (i.e. legally public) shall be re-usable for commercial or non-commercial purposes. This is the source of the statement by Commissioner Neelie Kroes that a “genuine right to re-use public information, not present in the original 2003 Directive” has been created. For documents of museums, libraries, and archives the old rule applies: re-use needs to be allowed first (except for cultural resources that are opened up after exclusive agreements for their digitisation have ended – see below).

Asking for documents to re-use, and redress mechanisms if denied

  • The way in which citizens can ask to be provided with documents for re-use, or the way government bodies can respond, has not changed;
  • The redress mechanisms available to citizens have been specified in slightly more detail. It specifies that one of the ways of redress should be through an “impartial review body with appropriate expertise”, “swift” and with binding authority, “such as the national competition authority, the national access to documents authority or the national judicial authority”. This, although more specific than before, is not the creation of a specific, speedy and independent redress procedure hoped for.

Charging practices

  • When charges apply, they shall be limited to the “marginal costs of reproduction, provision and dissemination”, which is left open to interpretation. Marginal costing is an important principle, as in the case of digital material it would normally mean no charges apply;
  • The PSI Directive leaves room for exceptions to the stated norm of marginal costing, for public sector bodies who are required to generate revenue and for specifically excepted documents: firstly, they rely once more on the concept of the public task, which in the previous version of the directive has raised so much discussion; secondly, a distinction is made between institutions that have to generate revenue to cover a substantial part of all their costs and those that may generally be fully-funded by the State (except for particular datasets of which the collection, production, reproduction and dissemination has to be covered for a substantial part by revenue). Could this be a way to cover economic or even commercial activities, by defining them as a ‘public task’, thereby avoiding the non-discrimination rules requiring equal treatment of possible competitors?
  • The exceptions remain bound to an upper limit, that of the old PSI directive for the exceptions relating to institutions having to generate revenue. For cultural institutions, the upper limit of the total income includes the costs of collection, production, preservation and rights clearance, reproduction and dissemination, together with a reasonable return on investment;
  • How costs are structured and determined, and used to motivate standard charges, needs to be pre-established and published. In the case of the mentioned exceptions, charges and criteria applied need to be pre-established and published, with the calculation used being made transparent on request (as was the general rule before);
  • This requirement for standard charges to be fully transparent up-front, meaning before any request for re-use is submitted, might prove to have an interesting impact: it is unlikely that public sector bodies will go through establishing marginal costs and the underlying calculations for every data set they hold, but charges can no longer be applied as they have not been pre-established, motivated and published.

Licensing

  • The new PSI Directive contains no changes concerning licensing, so no explicit move towards open licenses;
  • Where Member States attach conditions to re-use, a standard license should be available, and public sector bodies should be encouraged to use it;
  • Conditions to re-use should not unnecessarily restrict re-use, nor restrict competition;
  • The Commission is asked to assist the Member States by creating guidelines, particularly relating to licensing.

Non-discrimination and Exclusive agreements

  • The existing rules to ensuring non-discrimination in how conditions for re-use are applied, including for commercial activities by the public sector itself, are continued;
  • Exclusive arrangements are not allowed as before, except for ensuring public interest services, or for digitisation projects by museums, libraries and archives. For the former, reviews are mandated every 3 years; for the latter, reviews are mandated after 10 years and then every 7 years. However, it is only the duration that should be reviewed, not their existence itself. In return for the exclusivity, the public body has to get a free copy of the cultural resources which must be available for re-use when the exclusive agreement ends. Here, the cultural institutions no longer have a choice whether to allow re-use, but it may be several years before the resource actually becomes available.

Formats and standards

  • Open standards and machine readable formats should be used for both documents and their metadata, where easily possible, but otherwise any pre-existing format and language is acceptable.

In summary, the new PSI Directive does not seem to take the bold steps the open data movement has been clamoring for over the past five years. At the same time, real progress has been made. Member States with a constructive approach will feel encouraged to do more. Also, the effort of transparency in charging may dissuade public sector bodies from applying charges. But the new PSI Directive will not serve as a tool for citizens aiming for more openness by default and by design. Even with the new redress mechanisms, getting your rights acknowledged and acted upon will remain a long and arduous path as before.

It will be interesting to see the European Parliament, as representative body, debate this in plenary.

About the authors

Katleen Jansen is a postdoctoral researcher in information law at the Interdisciplinary Centre for Law and ICT of KU Leuven, coordinator of the LAPSI 2.0 thematic network (www.lapsi-project.eu) and board member of OKFN Belgium. She specialises in re-use of PSI, open data, access to information and spatial data infrastructures. Currently working on open data licensing for the Flemish Region.

Ton Zijlstra has been involved in open government data since 2008. He is working for local, national and international public sector bodies to help them ‘do open data well’, both as an activist and consultant. Ton wrote the first plans for the Dutch national data portal, did a stint as project lead for the European Commission at http://epsiplatform.eu, and is now partner at The Green Land, a Netherlands based open data consultancy. He is a regular keynote speaker on open data, open government, and disruptive change / complexity across Europe.

Open data highlights from European Data Forum 2013 in Dublin

April 16, 2013 in Events, Featured, LOD2, Open Data

 

Europe’s data league convened in Dublin last week – Open Data increasingly taking the stage

Over 500 data professionals gathered last week at European Data Forum conference in Dublin. This is the annual meeting place for industry, research, policy makers, and community initiatives to discuss the challenges and opportunities of Big Data in Europe. One of the main sentiments throughout the event was a profound interest in openly licensed data and developments in the field of linked data.

The Open Knowledge Foundation was represented by Sander van der Waal and myself, and we took part with reference to the LOD2 project (an EU-funded project on Linked Open Data) and the Apps for Europe project (supporting apps competitions around Europe) – as well as to stimulate open data discussions in general. That seemed to have an increasingly fertile ground, as one of the main sentiments throughout the conference was a profound general interest not only in linking data, but also making them legally and technically open.

Open Data on the political agenda

Irish Minister for Justice, Equality and Defense Alan Shatter was among the first in the official program – which was initiated with a brief video message from EU Vice President Neelie Kroes – to address the need to embrace linked data, rightly calling it the new digital frontier. He seemingly hinted at the need for open technical standards and open licensing to be the norm, by emphasizing the need to change EU data protection regulation to enable maximum gain from the massive opportunities put before us in linking the vast datasets (commonly referred to as Big Data). This notion was supported by Robert Viola, Deputy Director General at European Commission (from Directorate General for Communications Networks, Content and Technology) in his subsequent presentation highlighting among other how open data is the optimal way to improve public health systems.

Representatives from the European Commission’s DG Connect (Directorate General for Communications Networks, Content and Technology), Malte Beyer-Katzenberger and Francesco Barbato, continued this thought by presenting a concept called the EU Data Value Chain, which is a part of DG Connect’s effort to ensure that digital technologies can help deliver the growth which the EU needs. The initiative is working on creating a European data ecosystem in accordance with EU’s Open Data Policy which covers ao. open government data, public sector information (PSI) and Open Access. The reason for this is the need to pursue untapped business opportunities, ensure better governance and citizen empowerment (through transparency), and the need to address societal changes and accelerate scientific progress. In that regard the European Commission has been pushing members to open up data since the launch of the PSI directive in 2003.

Malte Beyer-Katzenberger also presented the EU Open Data Portal later in the conference program, which we at Open Knowledge Foundation have helped develop. The portal is part of the European open data infrastructure that aggregates metadata from sources across the EU and acts as a single access point which helps to identify what data exists without knowing who is holding them; at the same time, Beyer-Katzenberger noted, it also acts as a driver for re-use policies inside the organization.

Open data as an innovation strategy for industry

The first day of the event also saw the announcement of the winner of the European Data Innovator Award, which was given to Michael Gorriz, CIO of car manufacturer Daimler, for his linked knowledge systems in Mercedes cars. Gorriz explained how data is connecting customers and enterprises more directly – calling it an emerging new economy of crowdsourcing and interaction – and highlighted the enormous business potential of linked open data. Specifically, he stressed the importance getting data and information out of the technical and legal “silos” (referring to proprietary data) in order to create value. This obviously requires not only overcoming the technical challenge, but also the cultural one of adapting to making business and driving innovation through linked and open data. In this argument Gorriz referred specifically to Sir Tim Berners-Lee’s principles for linked open data and the need to leverage standards such as RDF and Sparql instead of developing proprietary technologies. As a key point he also urged other business leaders to step into the new economies by building trust and reducing the fear of data transparency – and to dare using linked open data to drive the cultural change of their enterprise.

In the field of energy, Florian Bauer from REEEP (The Renewable Energy and Energy Efficiency Partnership) gave a presentation advocating open data as a way of helping the uptake of clean sustainable energy in society in general. Based on experience from the more than 180 clean energy projects in 58 countries REEEP has supported, Bauer pointed out that the power of open data lies in energy companies avoiding replication of work by having joint access to data, and therefore being able to concentrate resources on their own expertise and keeping maintenance to a minimum. Additionally, open data allows for lowering CO2 emission by using the data that is already there. However, Bauer explained that this road has only just begun. Connecting data portals through open standards and with interoperability is needed, and the energy sector needs to publish more data – in raw, machine readable formats and under licenses that allow re-use.

Another major industry representative, Chief Engineer of IT at Statoil, Knut Sebastian Tungland (responsible for technology strategies and professional practices), spoke on the second day of the conference and started out by commenting on the main point that he felt he would take from the conference: namely that they need to act on open data in general, which is not something that he feels they’ve contributed to a lot to so far. In the same breath he expressed the difficulty in doing so and sent out an invitation to help them leverage these ideas – to help them figure out how to share their data.

Open Knowledge Foundation projects enabling innovation

The European Data Forum also featured a presentation by Open Knowledge Foundation (by Sander van der Waal and me) about the publicdata.eu project that has been developed as part of the LOD2-project (focusing on Linked Open Data). The publicdata.eu portal, which runs on the CKAN open source data management system, provides access to open, freely reusable datasets from local, regional and national public bodies across Europe.

The publicdata.eu portal has recently been updated with a new set of social features and visualization capabilities, inviting citizens to examine, discuss and share the datasets; thereby making it easier to find relevant data to use for science, journalism and research in general – as well as for business and app development purposes.

It was highly motivating to see open data being more and more widely acknowledged as a driver of innovation and growth. The Open Knowledge Foundation has been pushing for more openly licensed data for years, and we look forward to working with anyone to further stimulate innovation and wider uptake of openly licensed data and content.

Sustainable energy policy demands sustainable open data

April 8, 2013 in Featured, Open Data, WG Sustainability

What kinds of energy are we producing, and what kinds are we consuming? How much comes from renewable sources? What is our energy dependency on other countries? Energy policy is today at the heart of every country’s agenda, but can citizen discuss it fairly? Do even policymakers have enough reliable information to implement new energy transition programs, required to secure energy supplies and achieve CO2 reduction targets?

Europe aims to reach a low carbon economy through transition energy policies. The objective is that by 2050, the EU should cut its emissions to 80% below 1990 levels through domestic reductions alone. The strategy also discusses how the main sectors responsible for Europe’s emissions (i.e power generation, industry, transport, buildings, and agriculture) could make the transition most cost effectively. As part of its energy transition policy, Germany has called to close all its nuclear power plants by 2022. More recently, France launched a national debate on energy policy, with the aim of cutting its carbon emission by a factor of 4 or 5 by 2050, and in the meantime by reducing the share of nuclear in the electricity mix to 50% by 2025.

But how do we get there? As we discuss energy policies, much data is still missing – not only for the general public, but also for policy makers and energy players. To deliver a sustainable energy policy, we need a sustainable and smart open data approach.

Here are some of the data on energy transition that we could start to open:

CO2 microdata

The well known statistician Hans Rosling has launched a call for the release of CO2 microdata. There is at least one source of CO2 microdata in Europe that we could demand openly: the EU Emissions Trading System, which was launched in 2005 to fight climate change, and covers more than 10 000 factories, power stations and other CO2-emitting installations. Despite the fact that this microdata is being collected at an installation level, we only have access to CO2 emissions data per sector or countries – this needs to change.

Market players

Our energy future depends on the market. But what do we know about the energy market and its players? In a recent interview for newspaper Le Monde, Christophe de Margerie, head of the oil group Total, declared: “We need to put all the data on the table: energy demand, and available resources together with their cost, environmental impact and feasibility”. He was right in asking for those data, but he forgot to mention that we also need data about energy players themselves. Which players produce what types of energy in Europe? How much tax do they pay, and in which jurisdictions? How much do they invest in sustainable energy? How much CO2 and others pollutants do they produce? Private energy companies need to release their own data in order to be accountable. Projects like Open Corporates can help us to find datasets on energy private sector but there are still jurisdictions, such as France, where you cannot access corporate data for free.

Risk assessments

As we debate the future of energy, risk assessments on energy sources is key information we need. Once you have data on energy stocks, reserves and economic efficiency, you also need solid, peer-reviewed, scientific data on the risks associated with those energy sources. Debates on Nuclear Energy, Renewable Energy or Shale gas all need risk assessment data.

Smart Grid data

Smart grid technologies promise to better manage production and distribution of electricity through a better use of data. Smart Grid efficiency relies in part on consumer behaviours and third party innovation. This can only be achieved through the release of data captured from the smart grid system directly to consumers (smart disclosure) and anonymously to other stakeholders (open data).

Help us to identify data on energy transition

These are just a few examples, to show the importance of sustainable open data to sustainable energy policy – but there are many more. You can help us to identify them by telling us what kind of data we would need to tackle energy transition and sustainability challenges.


Open Transition Energie

open transition1

As part of the National debate on Energy Policy in France, which is due to end with a new Energy Policy Framework proposal by the end of the year, the French OKF local group launched Open Transition Energy, a simple website to share, explore and visualize open data and other open resources related to energy transition, together with a dedicated group on the French datahub nosdonnees.fr a dedicated group.

“We are entering an era of open science” says EU Vice President Neelie Kroes at launch of new global Research Data Alliance

March 21, 2013 in Open Access, Open Data, Open Science, Policy, WG Open Data in Science

Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda, gave a talk earlier this week renewing the EU’s strong, principled support for open science.

Speaking at the launch of a new global Research Data Alliance, she said that we are entering a new “era of open science”, which will be “good for citizens, good for scientists and good for society”.

She explicitly highlighted the transformative potential of open access, open data, open software and open educational resources – mentioning the EU’s policy requiring open access to all publications and data resulting from EU funded research.

She also alluded to the EU’s work encouraging national funding bodies to adopt similar approach to publicly funded research, and recent policy developments in the US and Australia.

The Research Data Alliance says it “aims to accelerate and facilitate research data sharing and exchange” and currently lists a number of working areas such as metadata harmonisation and legal interoperability.

While there does not yet appear to be an explicit focus on open data per se, we hope that the new organisation will take a principled, ‘open by default’ approach to data sharing, in line with the Panton Principles, and commensurate with Commissioner Kroes’s speech.

As always, our Open Science Working Group will continue to monitor and engage with relevant initiatives and policy developments in this area as they unfold. If you’d like to help us you can join our open-science discussion list, by signing up below:



Please create an account to get started.

Sign up to the Open Knowledge Newsletter

Get Updates