You are browsing the archive for LOD2.

LOD2 plenary, Vienna, 21-3 March 2012

March 23, 2012 in CKAN, Events, Linked Open Data, LOD2, OKF Austria

I am in Vienna, along with my colleague Ira, for a plenary meeting of the assorted partners of the LOD2 project. LOD2 is an EU-funded research project on Linked Open Data, the vision of an interlinked web of data known to many from Tim Berners-Lee’s TED talk. The meeting runs for 3 days, in which there will be discussions about the various work packages, but I have been given the task of blogging about the opening introductory session on Wednesday afternoon. (Full disclosure: I have received a handsome LOD2 mug as advance payment for my efforts.) The Open Knowledge Foundation is one of the partners, because the pan-European CKAN data portal publicdata.eu is part of the project. But being personally a relative newcomer, I was looking forward to finding out in this introductory session what the project is really all about.

[IMG: Delegates at LOD2 plenary]
Delegates at the LOD2 plenary

Sören Auer, the project co-ordinator, kicked off, giving an overview of the overview. He described the lifecycle of Linked Data, from extraction (from other structured or unstructured data) through to linking in to existing data, enrichment (perhaps by adding more structure), to the point where it can be explored for interesting patterns. For each stage in the lifecycle, there are tools being developed by the project – many are already released. Collectively these tools, which are all Open Source, form the LOD2 ‘stack’. Sören also mentioned some recent milestones, including a Serbian CKAN portal holding a lot of data in RDF, the native format for Linked Data; and a planned new data-oriented conference, the European Data Forum.

The tools: Work Packages 2-6

WP2: Optimising the store

Peter Boncz of CWI spoke about Work Package 2. (What happened to WP1, you ask? It was a prototype which finished earlier in the project.) WP2 concerns Virtuoso, the database part of the LOD2 stack. The challenge with RDF is to make a database that runs efficiently with huge quantities of data, as the potential for rich interlinking means the data is not neatly segmented into tables as in a normal database. A lot of progress has already been made, and he hopes that Virtuoso 7 will be released soon. It will be structured to enable better compression (speeding up processing by reducing I/O), and use adaptive caching to try to minimise the number of queries that need to be done more than once.

WP3: Getting the data

Jens Lehman of AKSW at the University of Leipzig was next, talking about WP3 on ‘extraction, enrichment and repair’: the creation of Linked Data from existing structured or unstructured sources, its enrichment with suitable taxonomies to describe it, and detecting inconsistencies or other problems with its structure. If that sounds like a wide-ranging package, it is: as Jens told me later over dinner (not entirely seriously), ‘anything that doesn’t fit in one of the other packages gets stuffed into WP3′! There are currently over 20 tools playing a role in this stage, including Natural Language Processing techniques for extracting data from free text.

WP4: Creating links

Next up was Robert Isele of the Freie Universität Berlin. WP4 aims to enrich RDF data by adding links to other data sources, as well as linking data together by identifying duplicate entities within or between datasets. Automatic tools suggest links that a user can confirm or reject. WP4 also includes work to create an RDF-enabled version of the open source data cleaning tool Google Refine.

WP5: User interfaces

Sean Policarpio of DERI reported on WP5 on browsing, visualisation and authoring interfaces. He demonstrated geospatial data on a map, filtered with a structured (faceted) search – combining the power of Linked Data with a mapping search like Google Maps. Associated with this, they have produced a ‘semantic authoring’ tool, allowing the user to add or edit Linked Data via the map. Their next tasks are to implement ‘social semantic networking’ – for example, notifications based on semantic content – and mobile interfaces for their semantic tools.

WP6: Integrating the tools

Finally, the engaging and very Belgian Bert van Nuffelen of TenForce spoke about WP6, which aims to make the various disparate tools in the LOD2 stack play nicely together. They have worked on making it easier for users to install the stack tools, a shared interface and shared authorisation using WebID. They have also recently released an intermediate version of the stack (version 1.1) with new and upgraded tools and better documentation.

By now it was 3 o’clock and, against all expectations, the meeting was ahead of schedule. So we had a relatively luxurious half-hour break for tea. Your correspondent and another relative newcomer, Jan from Tenforce, took the opportunity to get some fresh air and a feel for the Viennese genius loci. Or should that be Ortsgeist?

The use cases

WP7: Publishing

We had heard about the tools that had been, and are being, developed to manipulate Linked Data. But how will they be used? Refreshed by tea we returned to the meeting to hear about the three Work Packages concerned with use cases. Perhaps the most exciting talk of the afternoon came from Christian Dirschl of WP7 and Wolters Kluwer Germany (WKD). WKD is a legal and accountancy publisher who are already adapting and using the LOD2 stack tools to enhance their publishing business. Christian told us that ‘semantic technologies enable publishing media to create added value’, and WKD’s first release of news and media datasets created using Linked Data tools is on course for publication in April. By December they will release an interlinked version of the datasets, including links to DPpedia and further optimised tools.

WP8: Enterprise

Amar-Djalil Mezaour of Exalead presented the ‘enterprise’ use case WP8, an application to human resources with the aim of matching job vacancies to applicants. Some early work trying to model CVs had met criticism on the ground, among others, that the EU reviewers had doubts about volume of data freely available. WP8 has refocused its attention on job vacancies rather than CVs, for which there is plenty of data and better RDF support. They hope to release the results later this year, with vacancies ‘dashboards’ and analytics, faceted by sector, region, salary, etc, using Linked Data, and enriched with mashups with other sites such as social networks.

WP9: Government data

After a long wait in the wings, it was time for the OKF’s own Ira Bolychevsky to take centre stage at last. WP9 aims to explore the applications to making government data available and maximising its use. Its main visible output is publicdata.eu, which republishes open data from government portals throughout the European Union. publicdata.eu has recently been upgraded and repaired: it now runs the latest version of CKAN, introducing features such as data previews (like this) and – live on the DataHub and coming soon to publicdata.eu – a data API for structured data. Two subjects we hope to discuss more later in the plenary are closer integration with the LOD2 stack, and metadata standards.

[IMG: Ira Bolychevsky at LOD2 plenary]
Ira presenting WP9

Jindřich Mynarz briefly mentioned the new Czech CKAN portal. They have developed a detailed methodology as well as a ‘Quick Start guide’ for publishers, both of which they promise to make available in English soon (hurrah!)

Finally Vojtech Svatek of UEP gave a quick overview of WP9a, which aims to use Linked Data technology in the field of public procurement, with ontologies for public sector contracts – providing matchmaking and analytics not dissimilar from those in WP8.

A jug of wine, a loaf of bread

Perhaps the reader has read enough of Work Packages for now. Anticipating your satiety, the organisers had decided to defer the presentations from WP10-12 until Friday. In their place an outsider to the LOD2 project, Allan Hanbury, gave a lightning talk on a slightly related EU project, Khresmoi, which aims to provide useful searching tools for large medical databases.

Thus concluded the day’s business, and we all dispersed to our various hotels. The OKF contingent, along with TenForce, are staying in one just a couple of roads away. Crossing a road is hazardous in Vienna, because there are sometimes cars parked in what seems to be the middle of the road. You keep half-expecting some lights to change and the cars to zoom off. In fact they are parked between the road and the tramlines, along which long and elderly trams snake through the city.

In the evening, everyone from the day’s meetings reconvened and were whisked away on one such tram to an outlying districts of the city, for an evening at a (more or less) traditional Austrian Heurige, an untranslatable type of wine tavern. A true Heurige, Helmut from the Semantic Web Company explains to me as we hurtle along, is run by a vineyard, and gives people an opportunity to sample its new year’s crop of wine. (‘Heurige’ in Austrian German literally means ‘this year’.) It will have a licence to open for only 2 or 3 weeks a year, and when open will hang out a spray of branches and a lamp to signify the fact.

There is still some wine grown in Vienna, I am told, but most of the Viennese Heurigen are open all year round and are really just restaurants. But they recreate the atmosphere of the real thing. Patrons are served wine and a mixed plate of traditional local foods, which, for readers not familiar with Austrian cuisine, mainly consist of various kinds of sausage, potato and cabbage. They are delicious, and so is the Apfelstrudel that comes along later. The only thing I cannot recommend in Vienna is the tea. When will these foreigners learn that it must be made with boiling hot water?

To follow blogs from the LOD2 plenary, see the blog parade from the project blog.

Open Data Search: finding useful datasets, worldwide

March 16, 2011 in CKAN, LOD2, Open Government Data, Technical, WG Open Government Data

The following post is from Friedrich Lindenberg, who is a developer at the Open Knowledge Foundation working on CKAN, PublicData.eu and Open Spending.

Recently, there has hardly been a week in which there hasn’t been an announcement of a new local, regional or national open data initiative – including ever more extensive catalogues of data that is being opened up (CKAN alone now runs in 20 or more places). While this is great news for those of us interested in re-using the data, it also means it becomes increasingly hard to keep a good overview of what kind of data are available for which places. To get a better overview we’ve now started a meta search engine for open data, opendatasearch.org.

opendatasearch.org is a global version of the prototype publicdata.eu site we announced in January: it’s an aggregator for datasets, providing a simple and unified search interface to all of the catalogues contained. At the moment, this includes all known instances of the CKAN software, the Sunlight Foundation’s National Data Catalog (and with it a large number of US-based data sources), the World Bank data catalogue, Sweden’s DCat-enabled OpenGov.se and Nexedi’s Data Publica portal. We’ve also put up search.ckan.net which provides access to the combined index of all CKANs only.

Behind the scenes, opendatasearch.org is web spider with a twist: all collected data is converted to DCat, DERI/W3C’s RDF-based ontology for dataset descriptions. While this convention is still in early development, it’s interesting to see how well different kinds of catalogues can be expressed in it already (the harvested data can be found here). By harvesting a growing set of existing dataset descriptions, we hope to gather a comprehensive picture of the dataset properties that are widely used and that should be represented in a common format. Our goal with this is to establish some degree of interoperability between different data catalogues, leading into a federated catalogue architecture for Europe and perhaps beyond.

These standardization concerns aside, we want to make opendatasearch.org useful on its own. For the immediate future this means adding support for more filter options, including licenses (and their compliance to open data principles), languages used in metadata and the data itself and geographic scopes of the collected information. This, of course, is an open source development effort and we’d glad to welcome those interested in contributing comments, catalogue data or functionality on the ckan-discuss mailing list!

Notes from EU meeting on “pan-European open data portal”

December 13, 2010 in CKAN, Events, External, LOD2, OKF, OKF Projects, Open Data, Open Government Data, Policy, WG EU Open Data, WG Open Government Data, Working Groups

A report from an EU meeting on the “goals and requirements for a pan-European data portal” is now online:

I was invited on behalf of the Open Knowledge Foundation to discuss our work on the CKAN project, both as part of data.gov.uk and as part of the LOD2 project, which will bring together open data from local, regional and national public bodies across Europe.

From the introduction:

> On the 3rd of November 2010 the European Commission organised in Luxembourg a technical workshop on the goals and requirements for a possible pan-European data portal. Experts with practical experience in their respective countries were invited to share their experiences and ideas.

> The experts consider that such a portal would add value to existing regional and national initiatives by improving transparency on issues of EU-wide interest, providing evidence for better policy making, improving the efficiency of data-dependent administrative and business processes and stimulating economic development through EU-wide reuse of data.

> Several issues of legal, technical and socio-political nature must be addressed for such a portal to function effectively, among them the need for high level political support, the systematic adoption of reuse-friendly data licences, the promotion of established data standards for maximal interoperability and the organic involvement of European software developers and data-literate citizens.

> A pan-European portal should be able to expand rapidly in breadth (thus fostering the interest of the public with large numbers of relevant datasets) while at the same time also showing the value of deeper data integration, starting from a core set of statistical, financial, geospatial data of high quality. Agile prototyping and development models are recommended, given the extremely fast pace at which data initiatives are developing in Europe.

> A small working group should be created to drive the issue forward and meet regularly to identify more precisely technical requirements. The group should connect with other open data stakeholder groups established at the national or European level and contribute to the definition of European datasets, government open data conferences and software development competitions, with first results visible and publicised by mid-2011.

The report identifies several reasons for developing a pan-European data portal:

> A) For European citizens

> * Single point of access on European information > * Enabling services for citizens that live at country borders and/or work abroad > * knowledge of successful open government data initiatives in some Member States can drive further initiatives in other Member States

> B) For administrations

> * Improvement of interoperability across processes thanks to greater availability of data > * Improved comparability of EU 27 information and data > * Reduction in administrative costs > * Avoiding / cutting existing costs of re-publication of official information > * More efficiency in servicing Freedom of Information requests > * Involvement of European citizens (crowd sourcing approach) can have positive effects on transparency and quality of data.

> C) For economic development

> * Planning and monitoring resource for companies operating across EU borders > * Driving the European innovation process > * Driving force for European economy (information technology, new location based services, analyzing services et al) > * Harmonisation of standards and guidelines for open government data across Europe

It also highlighted the value of open licenses, which allow anyone to reuse the data for any purpose:

> The participants of the workshop furthermore identified appropriate data licensing at the source as the conceptual precondition for any value to be extracted by data reuse (developers will not reuse data if it is not clear that they have the right to do so). This appears to be mostly an issue of educating data publishers on the selection of an appropriate license. There may be however contexts in which this might turn out to be a legislative issue, to be considered in the context of the review of the Public Sector Information Directive. There was also consensus on the fact that a clear licensing policy should be created and enforced on a pan-European data portal so as to maximise the opportunity for data reuse.

The report concludes:

> [...] participants agreed that a pan-European data portal with the characteristics described above would add value to open data initiatives from the Member States. Such an initiative should be pursued without delay in order to exploit the current momentum of open government data initiatives across Europe

It is fantastic to see such interest in open government data from the European Commission, and we look forward to following further developments with great interest.

If you’re interested in keeping in touch with the Open Knowledge Foundation’s work in this area you can follow:

Interested in open government data in Europe?

November 26, 2010 in CKAN, LOD2, OKF, Open Data, Open Government Data, WG EU Open Data, WG Open Government Data, Working Groups

As you may know the OKF is working on an EU funded project called LOD2. Part of the project aims to bring together openly licensed, machine-readable datasets from local, regional and national public bodies throughout Europe. It will also provide free/open source tools and services for those interested in reusing open government data.

We are currently circulating a survey which will inform work undertaken in this area. If you are interested in open government data in Europe (whether as a publisher, producer, reuser or consumer) we’d be very grateful for 10-15 minutes of your time to let us know about what you would like to see from the technology that is being developed. You can find the survey here:

  • We’d also very much appreciate any help in forwarding this to relevant folks, and for any blogging/tweeting to make sure as many potentially interested people as possible have the opportunity to respond! The survey will be open until the 17th December 2010.

Announcing the LOD2 project

September 13, 2010 in CKAN, LOD2, Open Data, Open Government Data

I’m very pleased to announce that the Open Knowledge Foundation is a consortium partner in the recently funded FP7 project LOD2. From the overview:

Over the past three years, the semantic web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into a very promising candidate for addressing one of the biggest challenges in the area of intelligent information management: the exploitation of the Web as a platform for data and information integration in addition to document search. To translate this initial success into a world-scale disruptive reality, encompassing the Web 2.0 world and enterprise data alike, the following research challenges need to be addressed: improve coherence and quality of data published on the Web, close the performance gap between relational and RDF data management, establish trust on the Linked Data Web and generally lower the entrance barrier for data publishers and users.

The OKF depends on external funding to expand and provide better services more quickly. One of the absolutely fantastic things about the LOD2 project is that we’ll be able to expand the CKAN software specifically as part of developing an EU wide registry for open government data. We’ll be launching a great new website that helps collect all the great public data starting to be built across Europe at www.publicdata.eu (though the first version won’t be up for a bit, so patience with the current site).

We’ll of course be available to help anyone as much as we can across all the projects. In terms of formal responsibility and work, we participate in 6 workpackages:

  • WP1 – Requirements, Design and LOD2 Stack Prototype
  • WP6 – Interfaces, Component Integration & LOD2 Stack
  • WP9 – Use Case 3: LOD2 for Citizen – GovData.eu
  • WP10 – Training, Dissemination, Community Building, Fertilization
  • WP11 – Standardization, Exploitation
  • WP12 – Project Management

The project has a number of deliverables, and we’ll be in charge of:

  • WP9 – Use Case 3: LOD2 for Citizen – GovData.eu
  • D9.1.1 – First release of the GovData.eu website and tools
  • D9.1.2 – Intermediate release of the GovData.eu website and tools
  • D9.1.3 – Final release of the GovData.eu website and tools
  • D9.3.1 – Guide and best practices presentation
  • D9.3.2 – Guide and best practices brochure
  • D10.1.1 – LOD2 training course for external audiences

As you can see, most of our work will be concentrated on WP9, which we’re the lead partner on, together with Free University Berlin, University of Leipzig, Ten Force, and Wolters Kluwer Germany. More at: http://lod2.eu/WorkPackage/wp9.html

Use hashtag #lod2 on twitter, and follow @lod2project for more updates. The LOD2 project overview is also on the EU site.

LOD2 homepage is available at: http://lod2.eu/

The full project deliverables are at: http://lod2.eu/WikiArticle/Deliverables.html (Note they look to be in chronological order of delivery dates).

The project has even started a flickr account for sharing at: http://www.flickr.com/photos/lod2/

And of course, if you’re interested in Open Government Data, don’t forget Open Government Data Camp in November in London and if you’d like to participate more, join one of our working groups on Open Data.