Support Us

You are browsing the archive for Linked Open Data.

Community building through the DM2E project

Lieke Ploeger - April 8, 2015 in Community, DM2E, Linked Open Data, Open GLAM

During the past three years, Open Knowledge has been leading the community building work in the Digitised Manuscripts to Europeana (DM2E) project, a European research project in the area of Digital Humanities led by Humboldt University. Open Knowledge activities included the organisation of a series of events such as Open Data in Cultural Heritage workshops, running two rounds of the Open Humanities Awards and the establishment of OpenGLAM as an active volunteer-led community pushing for increased openness in cultural heritage.

DM2E and the Linked Open Web

dm2e_logoAs one of its core aims, the DM2E project worked on enabling libraries and archives to easily upload their digitised material into Europeana – the online portal that provides access to millions of items from a range of Europe’s leading galleries, libraries, archives and museums. In total, over 20 million manuscript pages from libraries, archives and research institutions were added during the three years of the project. In line with the Europeana Data Exchange Agreement, all contributing institutions agreed to make their metadata openly available under the Creative Commons Public Domain Dedication license (CC-0), which allows for easier reuse.

Since different providers make their data available in different formats, the DM2E consortium developed a toolset that converted metadata from a diverse range of formats into the DM2E model, an application profile of the Europeana Data Model (EDM). The developed software also allows the contextualisation and linking of this cultural heritage data sets, which makes this material suitable for use within the Linked Open Web. An example of this is the Pundit tool, which Net7 developed to enable researchers to add annotations in a digital text and link them to related texts or other resources on the net (read more).

Open Knowledge achievements

Open Knowledge was responsible for the community building and dissemination work within DM2E, which, apart from promoting and documenting the project results for a wide audience, focused on promoting and raising awareness around the importance of open cultural data. The presentation below sums up the achievements made during the project period, including the establishment of OpenGLAM as a community, the organisation of the event series and the Open Humanities Awards, next to the extensive project documentation and dissemination through various channels.


OpenGLAM-logoIn order to realise the value of the tools developed in DM2E, as well as to truly integrate the digitised manuscripts into the Linked Data Web, there need to be enough other open resources to connect to and an active community of cultural heritage professionals and developers willing to extend and re-use the work undertaken as part of DM2E. That is why Open Knowledge set up the OpenGLAM community: a global network of people and organisations who are working to open up cultural content and data. OpenGLAM focuses on promoting and furthering free and open access to digital cultural heritage by maintaining an overview of Open Collections, providing documentation on the process and benefits of opening up cultural data, publishing regular news and blog items and organising diverse events.

Since the start in 2012, OpenGLAM has grown into a large, global, active volunteer-led community (and one of the most prominent Open Knowledge working groups to date), supported by a network of organisations such as Europeana, the Digital Public Library of America, Creative Commons and Wikimedia. Apart from the wider community taking part in the OpenGLAM discussion list, there is a focused Working Group of 17 open cultural data activists from all over the world, a high-level Advisory Board providing strategic guidance and four local groups that coordinate OpenGLAM-related activities in their specific countries. Following the end of the DM2E project, the OpenGLAM community will continue to push for openness in digital cultural heritage.

Open Humanities Awards

openhumanitieslogosAs part of the community building efforts, Open Knowledge set up a dedicated contest awards series focused on supporting innovative projects that use open data, open content or open source tools to further teaching and research in the humanities: the Open Humanities Awards. During the two competition rounds that took place between 2013-2014, over 70 applications were received, and 5 winning projects were executed as a result, ranging from an open source Web application which allows people to annotate digitized historical maps (Maphub) to an improved search application for Wittgenstein’s digitised manuscripts (Finderapp WITTfind). Winners published their results on a regular basis through the DM2E blog and presented their findings at conferences in the field, proving that the awards served as a great way to stimulate innovative digital humanities research using open data and content. Details on all winning projects, as well as final reports on their results, are available from this final report.

DM2E event series

Over the course of the project, Open Knowledge organised a total of 18 workshops, focused on promoting best practices in legal and technical aspects of opening up metadata and cultural heritage content, providing demonstration and training with the tools and platforms developed in the project and hackdays and coding sprints. Highlights included the Web as Literature conference at the British Library in 2013, the Open Humanities Hack series and the Open Data in Cultural Heritage workshops, as a result of which several local OpenGLAM groups were started up. A full list of events and their outcomes is available from this final report.

og_fringe_okfest14 Open Data in Cultural Heritage Workshop: Starting the OpenGLAM group for Germany (15 July 2014, Berlin)

It has been a great experience being part of the DM2E consortium: following the project end, the OpenGLAM community will be sustained and build upon, so that we can realise a world in which our shared cultural heritage is open to all regardless of their background, where people are no longer passive consumers of cultural content created by an elite, but contribute, participate, create and share.

More information

Energy Buildings Performance Scenarios as Linked Open Data

Guest - June 6, 2014 in Linked Open Data, OK Austria, Technical

This is a blog post by Martin Kaltenböck & Anne-Claire Bellec, cross-posted from the Semantic Puzzle Blog. Anne-Claire Bellec is Communications Manager at the Global Buildings Performance Network (, located at GBPNs headquarters in Paris, France, and Martin Kaltenböck is the responsible for web-based data tools at Semantic Web Company, a Linked Open Data specialised IT company located in Vienna, Austria as well as Member of the Board of the Austrian Chapter of Open Knowledge.

The reduction of green house gas emissions is one of the big global challenges for the next decades. (Linked) Open Data on this multi-domain challenge is key for addressing the issues in policy, construction, energy efficiency, production a like. Today – on the World Environment Day 2014 – a new (linked open) data initiative contributes to this effort: GBPN’s Data Endpoint for Building Energy Performance Scenarios.


GBPN (The Global Buildings Performance Network) provides the full data set on a recently made global scenario analysis for saving energy in the building sector worldwide, projected from 2005 to 2050. The multidimensional dataset includes parameters like housing types, building vintages and energy uses – for various climate zones and regions and is freely available for full use and re-use as open data under CC-BY 3.0 France license.

To explore this easily, the Semantic Web Company has developed an interactive query / filtering tool which allows to create graphs and tables in slicing this multidimensional data cube. Chosen results can be exported as open data in the open formats: RDF and CSV and also queried via a provided SPARQL endpoint (a semantic web based data API). A built-in query-builder makes the use as well as the learning and understanding of SPARQL easy – for advanced users as well as also for non-experts or beginners.


The LOD based information- & data system is part of Semantic Web Companies’ recent Poolparty Semantic Drupal developments and is based on OpenLinks Virtuoso 7 QuadStore holding and calculating ~235 million triples as well as it makes use of the RDF ETL Tool: UnifiedViews as well as D2R Server for RDF conversion. The underlying GBPNontology runs on PoolParty 4.2 and serves also a powerful domain-specific news aggregator realized with SWC’s sOnr webminer.

Together with other Energy Efficiency related Linked Open Data Initiatives like REEEP, NREL, BPIE and others, GBPNs recent initative is a contribution towards a broader availability of data supporting action agains global warming – as also Dr. Peter Graham, Executive Director of GBPN emphasized “…data and modelling of building energy use has long been difficult or expensive to access – yet it is critical to policy development and investment in low-energy buildings. With the release of the BEPS open data model, GBPN are providing free access to the world’s best aggregated data analyses on building energy performance.” The Linked Open Data (LOD) is modelled using the RDF Data Cube Vocabulary (that is aW3C recommendation) including 17 dimensions in the cube. In total there are 235 million triples available in RDF including links to DBpedia and Geonames – linking the indicators: years – climate zones – regions and building types as well as user scenarios….

The Open Definition in context: putting open into practice

Laura James - October 16, 2013 in Featured, Linked Open Data, Open Data, Open Definition, Open Knowledge Definition, Open Standards

We’ve seen how the Open Definition can apply to data and content of many types published by many different kinds of organisation. Here we set out how the Definition relates to specific principles of openness, and to definitions and guidelines for different kinds of open data.

Why we need more than a Definition

The Open Definition does only one thing: as clearly and concisely as possible it defines the conditions for a piece of information to be considered ‘open’.

The Definition is broad and universal: it is a key unifying concept which provides a common understanding across the diverse groups and projects in the open knowledge movement.

At the same time, the Open Definition doesn’t provide in-depth guidance for those publishing information in specific areas, so detailed advice and principles for opening specific types of information – from government data, to scientific research, to the digital holdings of cultural heritage institutions – is needed alongside it.

For example, the Open Definition doesn’t specify whether data should be timely; and yet this is a great idea for many data types. It doesn’t make sense to ask whether census data from a century ago is “timely” or not though!

Guidelines for how to open up information in one domain can’t always be straightforwardly reapplied in another, so principles and guidelines for openness targeted at particular kinds of data, written specifically for the types of organisation that might be publishing them, are important. These sit alongside the Open Definition and help people in all kinds of data fields to appreciate and share open information, and we explain some examples here.

Principles for Open Government Data

In 2007 a group of open government advocates met to develop a set of principles for open government data, which became the “8 Principles of Open Government Data”.

In 2010, the Sunlight Foundation revised and built upon this initial set with their Ten Principles for Opening up Government Information, which have set the standard for open government information around the world. These principles may apply to other kinds of data publisher too, but they are specifically designed for open government, and implementation guidance and support is focused on this domain. The principles share many of the key aspects of the Open Definition, but include additional requirements and guidance specific to government information and the ways it is published and used. The Sunlight principles cover the following areas: completeness, primacy, timeliness, ease of physical and electronic access, machine readability, non-discrimination, use of commonly owned standards, licensing, permanence, and usage costs.

Tim Berners-Lee’s 5 Stars for Linked Data

In 2010, Web Inventor Tim Berners-Lee created his 5 Stars for Linked Data, which aims to encourage more people to publish as Linked Data – that is using a particular set of technical standards and technologies for making information interoperable and interlinked.

The first three stars (legal openness, machine readability, and non-proprietary format) are covered by the Open Definition, and the two additional stars add the Linked Data components (in the form of RDF, a technical specification).

The 5 stars have been influential in various parts of the open data community, especially those interested in the semantic web and the vision of a web of data, although there are many other ways to connect data together.

Principles for specific kinds of information

At the Open Knowledge Foundation many of our Working Groups have been involved with others in creating principles for various types of open data and fields of work with an open element. Such principles frame the work of their communities, set out best practice as well as legal, regulatory and technical standards for openness and data, and have been endorsed by many leading people and organisations in each field.

These include:

The Open Definition: the key principle powering the Global Open Knowledge Movement

All kinds of individuals and organisations can open up information: government, public sector bodies, researchers, corporations, universities, NGOs, startups, charities, community groups, individuals and more. That information can be in many formats – it may be spreadsheets, databases, images, texts, linked data, and more; and it can be information from any field imaginable – such as transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance and more.

Each of these organisations, kinds of information, and the people who are involved in preparing and publishing the information, has its own unique requirements, challenges, and questions. Principles and guidelines (plus training materials, technical standards and so on!) to support open data activities in each area are essential, so those involved can understand and respond to the specific obstacles, challenges and opportunities for opening up information. Creating and maintaining these is a major activity for many of the Open Knowledge Foundation’s Working Groups as well as other groups and communities.

At the same time, those working on openness in many different areas – whether open government, open access, open science, open design, or open culture – have shared interests and goals, and the principles and guidelines for some different data types can and do share many common elements, whilst being tailored to the specific requirements of their communities. The Open Definition provides the key principle which connects all these groups in the global open knowledge movement.

More about openness coming soon

Don’t miss our other posts about Defining Open Data, and exploring the Open Definition, why having a shared and agreed definition of open data is so important, and how one can go about “doing open data”.

Challenge launched to promote open data for education

Sander van der Waal - March 20, 2013 in Linked Open Data, Linked Up, OKI Projects

The LinkedUp project is very pleased to announce the launch of the LinkedUp challenge. This is a series of three competitions (Veni, Vidi, and Vici) promoting the innovative use of linked and open data in an educational context.

The LinkedUp team invites anyone, from researchers and students, to developers and businesses, to join the first ‘Veni’ competition. You can participate by building prototypes, demos and innovative tools that exploit, use, integrate or analyse large scale web data for educational use.

Some very attractive prizes are only one reason to join and participate in the challenge. It is also a great opportunity to work with a large, documented repository of linked datasets that the LinkedUp team is putting together. The consortium is also able to offer dedicated access to so far non-public resources. The challenge allows participants to showcase their ideas and solutions to a wide community of researchers and practitioners. For businesses as well as researchers, this will be a great opportunity to present their company and enhance their network. For people working in academia, the challenge will provide a wealth of material and opportunities for experiments and publications.

While the LinkedUp team already identified and connected many educational and non-educational resources to work with, participants can also use and connect their own material or other data sources. Anyone is free to showcase their creativity and solutions as long as the application is relevant to education in the broadest sense of the word. There are also some high profile use cases of established organisations made available that can serve as inspiration for innovative applications. Join today!

Important dates

  • March 2013: Launch of the Challenge
  • May 2013: Release of the comprehensive LinkedUp dataset
  • 27 June 2013: Submission deadline
  • 1 September 2013: Notifications and Nominations
  • 17 September 2013: Presentations and award ceremony

#OpenDataEDB 3

Naomi Lillie - September 14, 2012 in Bibliographic, Events, Join us, Linked Open Data, Meetups, OKScotland, Open Data, Open GLAM, Open Government Data, Open Knowledge Foundation

Amidst the kerfuffle and cacophony of the Fringe Festival packing up for another year, the Edinburgh contingent came together again to meet, greet, present and argue all aspects of Open Data and Knowledge.

OKFN Meet-ups are friendly and informal evenings for people to get together to share and debate all areas of openness. Depending on the number of people on a given evening, we have presentations and/or round-table discussions about Open Knowledge and Open Data – from politics and philosophy to the practicalities of theory and practice. We have had two previous events (see here for the ‘launch’ write-up and here for the invitation to the second instalment); this time we were kindly hosted by the Informatics Forum, and the weather stayed fine enough to explore the roof terrace (complete with vegetable garden, gizmos to record wind-speed and weather, a view across the city to Arthur’s Seat and even a blue moon).

Around 20 of us gathered together and presentations were given by the following people:

  • James Baster – Open Tech Calendar: an introduction to this early-stage project to bring tech meet-ups together, talk about the different ways we are trying to be open and ask for feedback and help;
  • Ewan Klein – a short overview of business models for Open Data, including for government bodies;
  • Gordon Dunsire – library standards and linked data;
  • Gill Hamilton – National Library of Scotland’s perspective of library standards and open data;
  • Bob Kerr – State of the Map Scotland (see here for Bob’s featured OKFN blog post);
  • Naomi Lillie – OKFN as part of the Scottish Open effort.

What struck me overall was that everybody already knows each-other… As well as cross-over in the talks, I kept trying to introduce people who would exclaim, “Ah yes! How was the holiday / conference / wedding?” or similar. This was quite useful, though, as it emphasised the point I made in my talk: OKFN doesn’t need to start anything in Scotland, as efforts towards Open are already ongoing and to great effect, we just want to provide support and possibly a brand under which these activities can be coordinated and promoted. With this in mind, we are going to look into a Scotland OKFN group as soon as things settle down again after OKFest – keep your eyes open for updates to follow!

To keep up-to-date with #OpenDataEDB and similar events, with the above and other interesting folks, and with the emerging Scotland OKFN group:

OKFN Energy Lab: Call for Partners

Velichka Dimitrova - July 25, 2012 in Events, Featured, Linked Open Data, OKI Projects, Sprint / Hackday, Workshop

OKF Energy Lab

OKFN Labs is launching Labs Sprints, a new initiative to create data-driven applications around a specific topic within a very short timeframe – a single week. As we start this, we’re looking for partners to help us frame the questions that our apps will aim to explore. To create such high-impact apps which can serve policy-making, our team needs a partner from the topic area who understands the background and the issue in question and can help us guide in the creations of a meaningful product.

Energy Data is theme for the first OKF Lab, taking place in Berlin 1-8 October, 2012, and bringing together a small team of coders, designers, data wranglers, technologists and policy experts. The theme is structured broadly to incorporate a wide range of sub-topics e.g. renewable energy resources and energy efficiency, fossil fuels and traditional energy structures, electricity demand and supply, government spending around energy policies as well as emissions from energy use in transport, industry, etc.

Open energy data is increasingly recognised by governments as “a powerful input to innovation” that can empower citizens, create jobs, encourage entrepreneurship and foster societal transformations. Access to energy data is also a citizen’s right: publicly-owned machine readable energy information and data should be made available and accessible to all sectors of society.

Creative energy data apps could assist users in forecasting future consumption based on previous usage data, mapping daily electricity consumption peaks and lows, providing web-based tools for emissions data-collection, comparing the efficiency and cost of alternative energy investments or presenting data in an easy-to-understand, interactive and engaging way.


Organisations that are working in this area are invited to partner with OKFN Labs on presenting a challenge for our team. The partners are expected to provide some support in the process of framing the Energy Lab and present an input in the form of a presentation about current research, policy and technological gaps.

Please contact us with a short e-mail, outlining the challenge in answering the following questions.

  • What is the problem you would like to solve?
  • Which are the groups and relevant audiences?
  • What kind of data would you like to use?

Contact e-mail: sprints [at]

Announcing: Linked Open Vocabularies (LOV), enabling the vocabulary commons

Pierre-Yves Vandenbussche - July 10, 2012 in Featured Project, Linked Open Data, News, Open Data, Our Work, Technical

We are delighted to announce that Linked Open Vocabularies is now being hosted on Open Knowledge Foundation servers and is now officially an Open Knowledge Foundation project.

LOV Project in 5 points

  • LOV is about vocabularies (aka. metadata element sets or ontologies) in OWL / RDFS used to describe linked data.
  • LOV provides a single-stop access to the Vocabulary Commons ecosystem
  • LOV helps to improve vocabularies understanding, visibility, usability, synergy, sustainability and overall quality
  • LOV promotes a technically and socially sustainable management of the Vocabulary Commons ecosystem
  • LOV is a community and open project. You are welcome to join the team of gardeners of the Vocabulary Commons!

Project context

The LOV project is borned in the framework of the Datalift project which aims at providing a platform to lift data from semi-structured formats (csv, xls, etc.) to linked data. Part of this project under Mondeca‘s company responsibility was focused on vocabulary selection and re-use. The LOV project purpose goes now far beyond this original catalogue. The LOV dataset is maintained by Bernard Vatant and Pierre-Yves Vandenbussche.

Project purposes

  • To identify vocabularies used or usable to express linked data in RDF
  • To harvest or create metadata and links between vocabularies
  • To suggest to vocabulary curators some vocabulary description improvements
  • To foster sustainable and responsible behavior of vocabulary creators and publishers
  • To provide advanced search features among vocabulary ecosystem elements

Project features

Among the various features of the LOV project, you can explore the vocabularies dataset using an intuitive UI. You can also access directly to an RDF dump via a file or an endpoint.

Lov 1

For every vocabulary, as much metadata as possible is harvested (gathered in the RDF file, in the documentation or via interaction with authors). For example, the links between a particular vocabulary and the ecosystem are shown as well as its different versions.

Lov 2

One may search for a particular vocabulary element using the LOV search feature, filtering results by domain, type, or vocabulary. This feature is enabled thanks to the LOV-bot which monitor all the vocabularies on a daily basis.

Lov 3

OKFN support and the future of Vocabulary Commons

Along with a sustainable and resilient future for vocabularies, we believe the LOV project should live far beyond the Datalift research project in which it is born. In that perspective, the Open Knowledge Foundation agreed to support our project for the future years. We are really delighted by this support, which strengthens our belief that heritage organizations will play a major role in vocabularies preservation.

LOV and Vocabulary Commons future belongs to its community. You are therefore, as an individual or organization, most welcome to participate in the future of LOV in many ways:

On the way to the new market of information in Russia

Ivan Begtin - June 29, 2012 in External, Linked Open Data, Open Government Data, Open Standards, WG EU Open Data, WG Open Government Data

On June 5th at the Higher School of Economics in Moscow a round table conference took place, devoted to the opening of state-collected datasets. It was convened by the Higher School of Economics (HSE) together with the Russian Office of the World Wide Web Consortium (W3C). Open data is the new trend in the state practices of the developed countries, and Russia also acknowledges the importance of this trend. The Presidential Decree of May 7th 2012, “About key measures for the improvement of the state governmental system” states that it is important to publish open government data by July 15th, 2013.

Oleg Pak, from the Ministry of Economic Development, told the round-table that his department is currently developing the standards and the concept of open data in Russia. Within the framework of this concept, they will develop a comprehensive strategy for open data usage in Russia. This concept should become a roadmap for the work of all authorities engaged in the realization of this vision.

As a rule, the realization of national projects for open data has two goals. The first is socio-political – the State should open the data for its citizens. This goal can be easily achieved with the existing level of technology. The main issue at the discussion in June was the achievement of the second goal: the transformation of state data arrays into a product suitable for cost-effective use. This would allow businesses to form a new structure of services, and offer previously non-existent things on the market.

In many countries, this is already happening, as Victor Klintsov (W3C, HSE) pointed out. “The USA Administration has already published over a million data sets. This has been published not for “readers”, but for computers and services which use this data for development of new data, products and services”, he said. Pavel Pugachev (Ministry of Communications and Mass Communications) cited the example of an IT-company in the US. Its programmers use anonymised medical data about outbreaks and numbers of patients, process it and supply large pharmaceutical companies with the results. This allows those companies to develop their demand and supply tactics. Pugachev suggested we ought to determine the open data priorities according to which data types will be most interesting to the market, and concentrate our efforts on opening them first and foremost.

A key issue in the data that is being opened in Russia is that of interoperability. Releases so far have been based on the idea of human consumption – it is largely unsuitable for computer “consumption”, being unmatched and in different formats. This massively limits its business potential. Meanwhile, in Moscow alone there are 4,000 portals state-owned portals and organizations without consistent principles of data delivery. Common publishing standards need to be established as a matter of priority.

Nonetheless, as Daniel Hladky (W3C) pointed out, we cannot simply wait for the development of all the regulations that will allow perfect publication: “Publish, what you have. As you can and by any means. Good or bad, with mistakes, unattractively, even if 90% of this data will be badly structured. Maybe it lacks metadata. I would like to say that it is necessary to pick up speed. If 5% of the information is useful, it will be a start and a push for the development of business”. In developed countries the open data market started not from acts of government, but from the activity of individuals who collected information and published it on their portals, bringing it up to a machine-processable state.

This opinion was supported by Maksim Dubinin (OpenStreetMap, GIS-Lab projects): “The community of users and the culture of usage will not appear until open data is presented in large enough quantities”. He shared his experience in the area of geodata. “When it became clear that it was impossible to wait for governmental steps in this field, projects started to appear in which users contributed geodata by themselves. Over 600,000 people around the world have taken part in the OpenStreetMap project. As a result, some governmental organizations have started using data created by users.”

Undoubtedly, this needs to come from both ends at once.

Progress with Russian open data projects will be presented by Daniel Hladky during the European commission workshop Using Open Data: policy modeling, citizen empowerment, data journalism, which is going to take place in Brussels this week.

Victor Klintsov promised that the next meeting of the round table participants will be held this autumn. W3C office is planning to invite the leaders of open data projects from the USA and Great Britain.

The shorthand transcript and presentation graphics of the round table conference will be published on the site of W3C Russian office

LOD2 plenary, Vienna, 21-3 March 2012

Mark Wainwright - March 23, 2012 in CKAN, Events, Linked Open Data, LOD2, OK Austria

I am in Vienna, along with my colleague Ira, for a plenary meeting of the assorted partners of the LOD2 project. LOD2 is an EU-funded research project on Linked Open Data, the vision of an interlinked web of data known to many from Tim Berners-Lee’s TED talk. The meeting runs for 3 days, in which there will be discussions about the various work packages, but I have been given the task of blogging about the opening introductory session on Wednesday afternoon. (Full disclosure: I have received a handsome LOD2 mug as advance payment for my efforts.) The Open Knowledge Foundation is one of the partners, because the pan-European CKAN data portal is part of the project. But being personally a relative newcomer, I was looking forward to finding out in this introductory session what the project is really all about.

[IMG: Delegates at LOD2 plenary]
Delegates at the LOD2 plenary

Sören Auer, the project co-ordinator, kicked off, giving an overview of the overview. He described the lifecycle of Linked Data, from extraction (from other structured or unstructured data) through to linking in to existing data, enrichment (perhaps by adding more structure), to the point where it can be explored for interesting patterns. For each stage in the lifecycle, there are tools being developed by the project – many are already released. Collectively these tools, which are all Open Source, form the LOD2 ‘stack’. Sören also mentioned some recent milestones, including a Serbian CKAN portal holding a lot of data in RDF, the native format for Linked Data; and a planned new data-oriented conference, the European Data Forum.

The tools: Work Packages 2-6

WP2: Optimising the store

Peter Boncz of CWI spoke about Work Package 2. (What happened to WP1, you ask? It was a prototype which finished earlier in the project.) WP2 concerns Virtuoso, the database part of the LOD2 stack. The challenge with RDF is to make a database that runs efficiently with huge quantities of data, as the potential for rich interlinking means the data is not neatly segmented into tables as in a normal database. A lot of progress has already been made, and he hopes that Virtuoso 7 will be released soon. It will be structured to enable better compression (speeding up processing by reducing I/O), and use adaptive caching to try to minimise the number of queries that need to be done more than once.

WP3: Getting the data

Jens Lehman of AKSW at the University of Leipzig was next, talking about WP3 on ‘extraction, enrichment and repair’: the creation of Linked Data from existing structured or unstructured sources, its enrichment with suitable taxonomies to describe it, and detecting inconsistencies or other problems with its structure. If that sounds like a wide-ranging package, it is: as Jens told me later over dinner (not entirely seriously), ‘anything that doesn’t fit in one of the other packages gets stuffed into WP3’! There are currently over 20 tools playing a role in this stage, including Natural Language Processing techniques for extracting data from free text.

WP4: Creating links

Next up was Robert Isele of the Freie Universität Berlin. WP4 aims to enrich RDF data by adding links to other data sources, as well as linking data together by identifying duplicate entities within or between datasets. Automatic tools suggest links that a user can confirm or reject. WP4 also includes work to create an RDF-enabled version of the open source data cleaning tool Google Refine.

WP5: User interfaces

Sean Policarpio of DERI reported on WP5 on browsing, visualisation and authoring interfaces. He demonstrated geospatial data on a map, filtered with a structured (faceted) search – combining the power of Linked Data with a mapping search like Google Maps. Associated with this, they have produced a ‘semantic authoring’ tool, allowing the user to add or edit Linked Data via the map. Their next tasks are to implement ‘social semantic networking’ – for example, notifications based on semantic content – and mobile interfaces for their semantic tools.

WP6: Integrating the tools

Finally, the engaging and very Belgian Bert van Nuffelen of TenForce spoke about WP6, which aims to make the various disparate tools in the LOD2 stack play nicely together. They have worked on making it easier for users to install the stack tools, a shared interface and shared authorisation using WebID. They have also recently released an intermediate version of the stack (version 1.1) with new and upgraded tools and better documentation.

By now it was 3 o’clock and, against all expectations, the meeting was ahead of schedule. So we had a relatively luxurious half-hour break for tea. Your correspondent and another relative newcomer, Jan from Tenforce, took the opportunity to get some fresh air and a feel for the Viennese genius loci. Or should that be Ortsgeist?

The use cases

WP7: Publishing

We had heard about the tools that had been, and are being, developed to manipulate Linked Data. But how will they be used? Refreshed by tea we returned to the meeting to hear about the three Work Packages concerned with use cases. Perhaps the most exciting talk of the afternoon came from Christian Dirschl of WP7 and Wolters Kluwer Germany (WKD). WKD is a legal and accountancy publisher who are already adapting and using the LOD2 stack tools to enhance their publishing business. Christian told us that ‘semantic technologies enable publishing media to create added value’, and WKD’s first release of news and media datasets created using Linked Data tools is on course for publication in April. By December they will release an interlinked version of the datasets, including links to DPpedia and further optimised tools.

WP8: Enterprise

Amar-Djalil Mezaour of Exalead presented the ‘enterprise’ use case WP8, an application to human resources with the aim of matching job vacancies to applicants. Some early work trying to model CVs had met criticism on the ground, among others, that the EU reviewers had doubts about volume of data freely available. WP8 has refocused its attention on job vacancies rather than CVs, for which there is plenty of data and better RDF support. They hope to release the results later this year, with vacancies ‘dashboards’ and analytics, faceted by sector, region, salary, etc, using Linked Data, and enriched with mashups with other sites such as social networks.

WP9: Government data

After a long wait in the wings, it was time for the OKF’s own Ira Bolychevsky to take centre stage at last. WP9 aims to explore the applications to making government data available and maximising its use. Its main visible output is, which republishes open data from government portals throughout the European Union. has recently been upgraded and repaired: it now runs the latest version of CKAN, introducing features such as data previews (like this) and – live on the DataHub and coming soon to – a data API for structured data. Two subjects we hope to discuss more later in the plenary are closer integration with the LOD2 stack, and metadata standards.

[IMG: Ira Bolychevsky at LOD2 plenary]
Ira presenting WP9

Jindřich Mynarz briefly mentioned the new Czech CKAN portal. They have developed a detailed methodology as well as a ‘Quick Start guide’ for publishers, both of which they promise to make available in English soon (hurrah!)

Finally Vojtech Svatek of UEP gave a quick overview of WP9a, which aims to use Linked Data technology in the field of public procurement, with ontologies for public sector contracts – providing matchmaking and analytics not dissimilar from those in WP8.

A jug of wine, a loaf of bread

Perhaps the reader has read enough of Work Packages for now. Anticipating your satiety, the organisers had decided to defer the presentations from WP10-12 until Friday. In their place an outsider to the LOD2 project, Allan Hanbury, gave a lightning talk on a slightly related EU project, Khresmoi, which aims to provide useful searching tools for large medical databases.

Thus concluded the day’s business, and we all dispersed to our various hotels. The OKF contingent, along with TenForce, are staying in one just a couple of roads away. Crossing a road is hazardous in Vienna, because there are sometimes cars parked in what seems to be the middle of the road. You keep half-expecting some lights to change and the cars to zoom off. In fact they are parked between the road and the tramlines, along which long and elderly trams snake through the city.

In the evening, everyone from the day’s meetings reconvened and were whisked away on one such tram to an outlying districts of the city, for an evening at a (more or less) traditional Austrian Heurige, an untranslatable type of wine tavern. A true Heurige, Helmut from the Semantic Web Company explains to me as we hurtle along, is run by a vineyard, and gives people an opportunity to sample its new year’s crop of wine. (‘Heurige’ in Austrian German literally means ‘this year’.) It will have a licence to open for only 2 or 3 weeks a year, and when open will hang out a spray of branches and a lamp to signify the fact.

There is still some wine grown in Vienna, I am told, but most of the Viennese Heurigen are open all year round and are really just restaurants. But they recreate the atmosphere of the real thing. Patrons are served wine and a mixed plate of traditional local foods, which, for readers not familiar with Austrian cuisine, mainly consist of various kinds of sausage, potato and cabbage. They are delicious, and so is the Apfelstrudel that comes along later. The only thing I cannot recommend in Vienna is the tea. When will these foreigners learn that it must be made with boiling hot water?

To follow blogs from the LOD2 plenary, see the blog parade from the project blog.

Get Updates