We are delighted to announce that Open Knowledge has received funding from Google to work on tool integration for Data Packages as part of our broader work on Frictionless Data to support the open data community.
What are Data Packages?
The funding will support a growing set of tooling around Data Packages. Data Packages provide functionality for data similar to “packaging” in software and “containerization” in shipping: a simple wrapper and basic structure for the transportation of data that significantly reduces the “friction” and challenges associated with data sharing and integration.
Data Packages also support better automation in data processing and do so without imposing major changes on the underlying data being packaged. As an example, comprehensive country codes is a Data Package which joins together standardized country information from various sources into a single CSV file. The Data Package format, at its simplest level, allows its creator to provide information describing the fields, license, and maintainer of the dataset, all in a machine-readable format.
In addition to the basic Data Package format –which supports any data structure– there are other, more specialised Data Package formats: Tabular Data Package for tabular data and based on CSV, Geo Data Package for geodata based on GeoJSON. You can also extend Data Package with your own schemas and create topic-specific Data Packages like Fiscal Data Package for public financial data.
What will be funded?
CKAN is an open source data management system that is used by many governments and civic organizations to streamline publishing, sharing, finding and using data. This project implements a CKAN extension so that all CKAN datasets are automatically available as Data Packages through the CKAN API. In addition, the extension ensures that the CKAN API natively accepts Tabular Data Package metadata and preserves this information on round-tripping.
This project also creates support for import and export of Tabular Data Packages to BigQuery, Google’s web service querying massive datasets. This involves scripting and a small online service to map Tabular Data Package to BigQuery data definitions. Because Tabular Data Packages already use CSV as the data format, this work focuses on the transformation of data definitions.
General SQL Integration
Finally, general SQL integration is being funded which would cover key open source databases like PostgreSQL and MySQL / MariaDB. This will allow data packages to be natively used in an even wider variety of software that depend on these databases than those listed above.
If you have any questions, comments or would like more information, please visit this topic in our OKFN Discuss forum.