Blog written by Freyja van den Boom (FutureTDM researcher) and Lieke Ploeger.

Since September 2015 Open Knowledge International has been working on finding new ways to improve the uptake of text and data mining in the EU, as part of the FutureTDM project. Text and data mining (TDM) is the process of extracting relevant information from large amounts of machine-readable data (such as scientific papers) and recombining this to unlock new knowledge and power innovation (see ‘Techniques, Tools & Technologies for TDM in Europe’). Project partners include libraries, publishers and universities, but also the non-profit organisation ContentMine that advocates for the right to mine content. Open Knowledge International leads the work on communication, mobilisation and networking and undertakes the research into best practices and methodologies.

A practical example explaining the use of TDM

Because the use of TDM is significantly lower in Europe than in some countries in the Americas and Asia, FutureTDM actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help pinpoint why uptake is lower, raise awareness of TDM and develop solutions. This is especially important at this current time, because an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

Throughout 2016 we organised Knowledge Cafe’s across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field and held stakeholder consultations with the various communities.  In September 2016 we held the first of two workshops to discuss the project’s findings in Brussels where many MEPs and policymakers were present. In early 2017 a roundtable was organised at the Computer Privacy and Data Protection (CPDP) conference in Brussels, where the impact of data protection regulations for the uptake of advanced data analysis technologies like TDM was discussed.

MEP Julia Reda discussing the upcoming copyright reform at the FutureTDM workshop

Below are some of the insights we have gained through are research so far, which include the main barriers for different TDM stakeholder communities. In the upcoming months we will be publishing more of the results and proposed solutions on how to overcome them.

Education and skill
There is a need for more education on the benefits and practical use of TDM for researchers: working together with industry, publishing community and academia to develop effective courses aimed at different levels depending on the discipline and type of research that is likely to use TDM. We are currently working on TDM education and looking to get feedback on what the learning outcomes should be. If you are interested to get involved contact us !

Legal and policy
There is no legal clarity about the legal status of TDM practices and use of results that are gained through using TDM. Barriers include the uncertainty about the scope of copyright, database protection and privacy and data protection regulations. See for example our guest blog here.

The current copyright reform discussions focuses partly on a TDM exception which could help provide more clarity. Under discussion is for example what data and what usefalls under copyright, for example whether there should be a distinction between commercial and non-commercial use. FutureTDM partners are monitoring these developments.

We have recently published the FutureTDM policy framework introducing high level principles that should be the foundation of every stakeholder action that aims to promote TDM. These high level principles are:

  • Awareness and Clarity: actions should improve certainty on the use of TDM technologies. Information and clear actions are crucial for a flourishing TDM environment in Europe.
  • TDM without Boundaries: insofar as appropriate, boundaries should be cleared to prevent and take away fragmentation in the TDM landscape.
  • Equitable Access: access to TDM tools and technologies, as well as sources (such as datasets), are indispensable for a successful uptake of TDM, but usually comes at a price. While a broadest possible access to tools and data should be the aspiration, providers of these also have a legitimate interest in restricting access, for example for the protection of their investments or any privacy related interest.

Technical and infrastructure
The main concern is access and quality of available data. There is a confidence in the technological developments of more reliable and easy to use tools and services, although the documentation and findability of relevant tools and services is reported as a barrier at the moment.

Developing standards for data quality is seen as a useful but most likely impossible solution given the diversity in projects and requirements, which would make standards too complex for compliance.

Economy and Incentives
Barriers that are mentioned are the lack of a single European market, the problems of having multiple languages and a lack of enforcement for US companies.

Further research
The interviews and the case studies have provided evidence of and insight into the barriers that exist in Europe. To what extent these barriers can be solved given the different interests of the stakeholders involved remains a topic for further research within the FutureTDM project.

We will continue to work on recommendations, guidelines and best practices for improving the uptake of TDM in Europe, focused on addressing the barriers presented by the main stakeholders. All findings, which include policy recommendations, guidelines, case studies, best practices, practical tutorials and help and how to guides to increase TDM uptake are shared through the platform at The FutureTDM awareness sheets for example cover a range of factors that have an impact on TDM uptake and were created from our expert reports, expert interviews and discussions through our Knowledge Café events. The reports that have been completed so far are available from the Knowledge Library.

In the final six months of the FutureTDM project, there are many opportunities to find out more about the results and give your feedback on the situation around TDM in Europe. On 29 March, the second FutureTDM workshop at the European Parliament in Brussels  will take place, where your input on TDM experiences on the ground is very welcome. With EU copyright reform now in progress, we bring together policy makers and stakeholder groups so that we can share FutureTDM’s  findings and our first expert driven policy recommendations that can help increase EU TDM. To find out more and sign up, please check the event page. We will showcase the final project results during the final FutureTDM symposium, organised in conjunction with the International Data Science Conference (12-13 June 2017, Salzburg, Austria.

Our animation explaining TDM and the importance of stakeholder engagement

+ posts

As Communications Officer, Lieke works on increasing the profile and awareness of Open Knowledge Foundation projects online. She previously coordinated the OpenGLAM initiative, promoting free and open access to digital cultural heritage data and has been managing European projects in the areas of open cultural data, open access and open science. She is based in Berlin, where she also serves as Community Director of the Disruption Network Lab.

1 thought on “FutureTDM: The Future of Text and Data Mining”

  1. very useful information on data mining.The future of data mining although depends on predictive statics and there will be many significant changes in it in recent years.

Comments are closed.