Data Catalog Schema and Protocol – Draft Specification

Open Data is an idea that continues to gain momentum, and one of the signs of this is that the world has more and more data catalogs. This is great for many reasons but it also brings its own problem especially around interoperability and standardization — the lack of standard schema and interfaces is something we’ve experienced in our work on projects like which pulls together dataset information from many different data catalogs around Europe.

Last year we convened an international data catalogs meeting in Edinburgh. Since then we at the Open Knowledge Foundation, in collaboration and consultation with the W3C’s DCAT team, have been working on a draft specification for a data catalog schema (format) and protocol for accessing and syncing data catalogs. A first draft of this standard is now ready and we’re putting out a request for comments:

Contribute

Roughly the specification consists of 2 parts:

A schema (in essence DCAT) specifying a serialization of Dataset information,
A protocol / API for getting this information from a compliant data catalogue site.

We emphasize that this is a first draft, and is intentionally fairly rough as an invitation to contribute. You can do this in several ways:

Join the discussion on the relevant mailing list.
The spec is stored in a git repository on GitHub – comment on the issue tracker there.
Fork and patch the spec’s Git repository.