Data sharing has come a long way over the years. With open source tools, improvements and new features are always quickly on the horizon. Serah Rono looks at the improvements that have been made to open source data management system CKAN through the course of the ROUTE-TO-PA project.
In the present day, 5MB worth of data would probably be a decent photo, a three-minute song, or a spreadsheet. Nothing worth writing home about, let alone splashing across front pages of mainstream media. This was not the case in 1956 though – in September of that year, IBM made the news by creating a 5MB hard drive. It was so big, a crane was used to lift it onto a plane. Two years later, in 1958, the World Data Centre was established to allow users open access to scientific data. Over the years, data storage and sharing options have evolved to be more portable, secure, and with the blossoming of the Internet, virtual, too.
One such virtual data sharing platform, CKAN, has been up and running for ten years now. CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.
It is no wonder then that ROUTE-TO-PA, a Horizon2020 project pushing for transparency in public administrations across the EU, chose CKAN as a foundation for its Transparency Enhancing Toolset (TET). As one of ROUTE-TO-PA’s tools, the Transparency Enhancing Toolset provides data publishers with a platform on which they can open up data in their custody to the general public.
So, what improvements have been made to the CKAN base code to constitute the Transparency Enhancing Toolset? Below is a brief list:
1. Content management system support
CKAN Integration with a content management system enables publishers to publish content related to datasets and publish updates related to the portal in an easy way. TET WordPress plugin seamlessly integrates TET enabled CKAN and provides rich content publishing features to publishers and an elegantly organized entry point to data portal.
2. PivotTable
CKAN platform has limited data analysis capabilities, essential for working with data. ROUTE-TO-PA added a PivotTable feature to allow users to view, summarize and visualize data. From the data explorer in this example, users can easily create pivot tables and even run SQL queries. See source code here.
3. OpenID
ROUTE-TO-PA created an OpenID plugin for CKAN which enabled OpenID authentication on CKAN. See source code here.
4. Recommendation for related datasets
With this feature, the application recommends related datasets a user can look at based on the current selection and other contextual information. The feature guides users to find potentially useful and relevant datasets. See example in this search result for datasets on bins in Dublin, Ireland.
5. Combine Datasets Feature
This feature allows users to combine related datasets in their search results within TET into one ‘wholesome’ dataset. Along with the Refine Results feature, the Combined Datasets feature is found in the top right corner of the search results page, as in this example. Please note, that only datasets with the same structure can be combined at this point. Once combined, the resulting dataset can be downloaded for use.
6. Personalized search and recommendations
Personalized search feature allows logged-in users to get personalized search based on details provided in their profile. In addition logged-in users are provided with personalized recommendations based on their profile details.
7. Metadata quality check/validation
Extra validations to dataset entry form are added to prevent data entry errors and to ensure consistency.
You can find, borrow from and contribute to CKAN and TET code repositories on Github, join CKAN’s global user group or email serah.rono@okfn.org with any/all of your questions. Viva el open source!
Developer Advocate, Open Knowledge Foundation