The European Commission proposed a new PSI Directive, that describes when and how publicly held data can be re-used by anyone (aka open government data). The proposal contains several highly interesting elements: it extends the scope to public undertakings (utilities and transport mostly) and research data, it limits the ways in which government can charge for data, introduces a high value data list which must be freely and openly available, mandates API’s, and makes de-facto exclusive arrangements transparent. It also calls for delegated powers for the EC to change practical details of the Directive in future, which opens interesting possibilities. In the coming months (years) it remains to be seen what the Member States and the European Parliament will do to weaken or strengthen this proposal.
Changes in the PSI Directive announced
On 25 April, the European Commission announced new measures to stimulate the European data economy, said to be building on the GDPR, as well as detailing the European framework for the free flow of non-personal data. The EC announced new guidelines for the sharing of scientific data, and for how businesses exchange data. It announced an action plan that increases safeguards on personal data related to health care and seeks to stimulate European cooperation on using this data. The EC also proposes to change the PSI Directive which governs the re-use of public sector information, commonly known as Open Government Data. In previous months the PSI Directive was evaluated (see an evaluation report here, in which my colleague Marc and I were involved).
This post takes a closer look at what the EC proposes for the PSI Directive. (I did the same thing when the last version was published in 2013) This is of course a first proposal from the EC, and it may significantly change as a result of discussions with Member States and the European Parliament, before it becomes finalised and enters into law. Taking a look at the proposed new directive is of interest to see what’s new, what from an open data perspective is missing, and to see where debate with Member States is most likely. The bullets indicate the more interesting changes.
The Open Data yardstick
The original PSI Directive was adopted in 2003 and a revised version implemented in 2015. Where the original PSI Directive stems from well before the emergence of the open data movement, and was written with mostly ‘traditional’ and existing re-users of government information in mind, the 2015 revision already adopted some elements bringing it closer to the Open Definition. With this new proposal, again the yardstick is how it increases openness and sets minimum requirements that align with the open definition, and how much of it will be mandatory for Member States. So, scope and access rights, redress, charging and licensing, standards and formats are important. There are also some general context elements that stand out from the proposal.
A floor for the data-based society
In the recital for the proposal what jumps out is a small change in wording concerning the necessity of the PSI Directive. Where it used to say “information and knowledge” it now says “the evolution towards a data-based society influences the life of every citizen”. Towards the end of the proposal it describes the Directive as a means to improve the proper functioning of the European data economy, where it used to read ‘content industry’. The proposed directive lists minimum requirements for governments to provide data in ways that enable citizens and economic activity, but suggests Member States can and should do more, and not just stick with the floor this proposal puts in place.
Novel elements: delegated acts, public undertakings, dynamic data, high value data
There are a few novel elements spread out through the proposal that are of interest, because they seem intended to make the PSI Directive more flexible with an eye to the future.
- The EC proposal ads the ability to create delegated acts. This would allow practical changes without the need to revise the PSI Directive and have it transposed into national law by each Member States. While this delegated power cannot be used to change the principles in the directive, it can be used to tweak it. Concerning charging, scope, licenses and formats this would provide the EC with more elbow room than the existing ability to merely provide guidance. The article is added to be able to maintain a list of ‘high value datasets’, see below.
- Public undertakings are defined and mentioned in parallel to public sector bodies in each provision . Public undertakings are all those that are (in)directly owned by government bodies, significantly financed by them or controlled by them through regulation or decision making powers. It used to say only public sector, basically allowing governments to withdraw data from the scope of the Directive by putting them at a distance in a private entity under government control. While the scope is enlarged to include public undertakings in specific sectors only, the rest of the proposal refers to public undertakings in general. This is significant I think, given the delegated powers the EC also seeks.
- Dynamic and real-time data is brought firmly in scope of the Directive. There have been court cases where data provision was refused on the grounds that the data did not exist when the request was made. That will no longer be possible with this proposal.
- The EC wants to make a list of ‘high value datasets’ for which more things are mandatory (machine readable, API, free of charge, open standard license). It will create the list through the mentioned delegated powers. In my experience deciding on high value datasets is problematic (What value, how high? To whom?) and reinforces a supply-side perspective more over a demand driven approach. The Commission defines high value as “being associated with important socio-economic benefits” due to their suitability for creating services, and “the number of potential beneficiaries” of those services based on these datasets.
Access rights and scope
- Public undertakings in specific sectors are declared within scope. These sectors are water, gas/heat, electricity, ports and airports, postal services, water transport and air transport. These public undertakings are only within scope in the sense that requests for re-use can be submitted to them. They are under no obligation to release data.
- Research data from publicly funded research that are already made available e.g. through institution repositories are within scope. Member States shall adopt national policies to make more research data available.
- A previous scope extension (museums, archives, libraries and university libraries) is maintained. For educational institutions a clarification is added that it only concerns tertiary education.
- The proposed directive builds as before on existing access regimes, and only deals with the re-use of accessible data. This maintains existing differences between Member States concerning right to information.
- Public sector bodies, although they retain any database rights they may have, cannot use those database rights to prevent or limit re-use.
Asking for documents to re-use, and redress mechanisms if denied
- The way in which citizens can ask for data or the way government bodies can respond, has not changed
- The redress mechanisms haven’t changed, and public undertakings, educational institutes research organisations and research funding organisations do not need to provide one.
- The proposal now explicitly mentions free of charge data provision as the first option. Fees are otherwise limited to at most ‘marginal costs’
- The marginal costs are redefined to include the costs of anonymizing data and protecting commercially confidential material. The full definition now reads “ marginal costs incurred for their reproduction, provision and dissemination and where applicable anonymisation of personal data and measures to protect commercially confidential information.” While this likely helps in making more data available, in contrast to a blanket refusal, it also looks like externalising costs on the re-user of what is essentially badly implemented data governance internally. Data holders already should be able to do this quickly and effectively for internal reporting and democratic control. Marginal costing is an important principle, as in the case of digital material it would normally mean no charges apply, but this addition seems to open up the definition to much wider interpretation.
- The ‘marginal costs at most’ principle only applies to the public sector. Public undertakings and museum, archives etc. are excepted.
- As before public sector bodies that are required (by law) to generate revenue to cover the costs of their public task performance are excepted from the marginal costs principle. However a previous exception for other public sector bodies having requirements to charge for the re-use of specific documents is deleted.
- The total revenue from allowed charges may not exceed the total actual cost of producing and disseminating the data plus a reasonable return on investment. This is unchanged, but the ‘reasonable return on investment’ is now defined as at most 5 percentage points above the ECB fixed interest rate.
- Re-use of research data and the high value data-sets must be free of charge. In practice various datasets that are currently charged for are also likely high value datasets (cadastral records, business registers for instance). Here the views of Member States are most likely to clash with those of the EC
- The proposal contains no explicit move towards open licenses, and retains the existing rules that standard license should be available, and those should not unnecessarily restrict re-use, nor restrict competition. The only addition is that Member States shall not only encourage public sector bodies but all data holders to use such standard licenses
- High value datasets must have a license compatible with open standard licenses.
Non-discrimination and Exclusive agreements
- Non-discrimination rules in how conditions for re-use are applied, including for commercial activities by the public sector itself, are continued
- Exclusive arrangements are not allowed for public undertakings, as before for the public sector, with the same existing exceptions.
- Where new exclusive rights are granted the arrangements now need to made public at least two months before coming into force, and the final terms of the arrangement need to be transparent and public as well.
- Important is that any agreement or practical arrangement with third parties that in practice results in restricted availability for re-use of data other than for those third parties, also must be published two months in advance, and the final terms also made transparent and public. This concerns data sharing agreements and other collaborations where a few third parties have de facto exclusive access to data. With all the developments around smart cities where companies e.g. have access to sensor data others don’t, this is a very welcome step.
Formats and standards
- Public undertakings will need to adhere to the same rules as the public sector already does: open standards and machine readable formats should be used for both documents and their metadata, where easily possible, but otherwise any pre-existing format and language is acceptable.
- Both public sector bodies and public undertakings should provide API’s to dynamic data, either in real time, or if that is too costly within a timeframe that does not unduly impair the re-use potential.
- High value datasets must be machine readable and available through an API
Let’s see how the EC takes this proposal forward, and what the reactions of the Member States and the European Parliament will be.