Datapkg 0.5 has been released! This is the first release deemed suitable for public consumption (though we are still in alpha)! This announce therefore serves as both introduction and release announcement.
From the docs:
datapkg is an user tool for distributing, discovering and installing data (and content) ‘packages’.
datapkg is a simple way to ‘package’ data building on existing packaging tools developed for code (e.g. Debian apt, PyPI, CRAN, Gems, CPAN). datapkg is designed to integrate closely with the CKAN (Comprehensive Knowledge Archive Network).
In terms of the big picture, datapkg is the “apt-get/aptitude/dpkg” part of the vision for a ‘Debian of Data’ (i.e. scalable, distributed, open data infrastructures! — for more see this post or these recent slides):
Datapkg is a key part of making data sharing automatable. As an end-user tool it allows automated (command-line or scripted) discovery, installation and sharing of data “packages” either standalone or via interaction with a registry like CKAN.
Trying It Out
If you’re interested in giving it a spin here are installation instructions. Once you’ve got it running you can then do things like (see the manual for more):
Search for a package in an Index e.g. on CKAN.net::
# let's search for iso country/language codes data (iso 3166 ...) $ datapkg search ckan:// iso ... iso-3166-2-data -- Linked ISO 3166-2 Data ...
Get some information about one of them (in this case 2-digit ISO country codes in RDF)::
$ datapkg info ckan://iso-3166-2-data .... ....
Let’s install it (to the current directory)::
$ datapkg install ckan://iso-3166-2-data .
This will download the Package ‘iso-3166-2-data’ together with its “Resources” and unpack it into a directory named ‘iso-3166-2-data’.
datapkg is intended to be a generic tool for data packaging. As such, we want it to deal with as many “distribution” formats and as many different registries as possible. We’ve therefore designed datapkg to be extensible so that it can easily be adapted to talk with other systems. What kinds of plugins might one write?
- A plugin to discover data “packages” from RDFa information in web-pages, especially those in Government data catalogues (suggested by Ed Summers
- A plugin to Ensembl http://www.ensembl.org/
- A plugin to extract download urls or SPARQL endpoints from VoID descriptions (suggested by Richard Cynganiak)
We’re looking for more such suggestions as well as for people who’d like to implement plugins. If you’re interested please get in touch: http://www.okfn.org/contact/
8 thoughts on “Introducing Datapkg: A Tool for Distributing, Discovering and Installing Data “Packages””
What is your plan for dealing with versions of data and whether someone has made local changes to the data in a package? Will you do any checking like a version control system does before overwriting packaged data?
Is there an issue tracker to provide feedback and bug reports? Great concept!
@Mike: datapkg tickets are managed as part of the ckan trac (though datapkg is obviously independent of ckan). You can find that at http://knowledgeforge.net/ckan/trac. When posting on datapkg make sure to assign it to the datapkg component.
The links http://knowledgeforge.net/ckan/trac and http://knowledgeforge.net/ckan/doc/datapkg/install.html are currently redirecting to the ckan.org home page. Might be good to link to https://bitbucket.org/okfn/datapkg/overview to save a bit of searching.
The links to http://knowledgeforge.net/ckan/doc/datapkg/install.html and http://knowledgeforge.net/ckan/trac seem to redirect to the home page on ckan.org. Perhaps it’s good to redirect to https://bitbucket.org/okfn/datapkg/overview instead?
Yes, what Rolf said! Thanks for the link, Rolf :)
@Dan and Rolf: thanks for heads up and links are now fixed. I’ve also started a wiki page for Datapkg on the the CKAN wiki: http://wiki.ckan.net/Datapkg
Comments are closed.