The FutureTDM project, in which Open Knowledge International participates, actively engages with stakeholders in the EU such as researchers, developers, publishers and SMEs to help improve the uptake of text and data mining (TDM) in Europe (read more). Last month, we held our FutureTDM Symposium at the International Data Science Conference 2017 in Salzburg, Austria. With the project drawing to a close, we shared the project findings and our first expert driven policy recommendations and practitioner guidelines. This blog report has been adapted from the original version on the FutureTDM blog.
The FutureTDM track at the International Data Science Conference 2017 started with a speech by Bernhard Jäger form SYNYO who did a brief introduction to the project and explained the purpose of the Symposium – bringing together policy makers and stakeholder groups to share with them FutureTDM’s findings on how to increase TDM uptake.
This was followed by a keynote speech on the Economic Potential of Data Analytics by Jan Strycharz from Fundacja Projekt Polska, a FutureTDM project partner. It was estimated that automated (big) data and analytics – if developed properly – will bring over 200 B Euro to the European GDP by 2020. This means that algorithms (not to say robots) will be, then, responsible for 1.9% of the European GDP. You can read more on the TDM impact on economy in our report Trend analysis, future applications and economics of TDM.
Dealing with the legal bumps
The plenary session with keynote speeches was followed by the panel: Data Analytics and the Legal Landscape: Intellectual Property and Data Protection. As an introduction to this legal session Freyja van den Boom from Open Knowledge International presented our findings on the legal barriers to TDM uptake that mainly refer to type of content and applicable regime (IP or Data Protection). Having gathered evidence from the TDM community, FutureTDM has identified three types of barriers: uncertainty, fragmentation and restrictiveness and developed guidelines recommendation how to overcome them. We have summarised this in our awareness sheet Legal Barriers and Recommendations.
This was followed by the statements from the panelists: Prodromos Tsiavos (Onassis Cultural Centre/ IP Advisor) stressed the fact that with the recent changes in the European framework, the law faces significant issues and balancing the industrial interest is becoming necessary. He added that in order to initiate the uptake of the industry, a different approach is certainly needed because the industry will continue with license arrangements.
Duncan Campbell (John Wiley & Sons, Inc.) concentrated on Copyright and IP issues. How do we deal with all the knowledge created? How does the copyright rule has influence? He spoke about EU Commission Proposal and UK TDM exception – how to make an exception work?
Marie Timmermann (Science Europe) also focused on the TDM exception and its positive and negative sides. From the positive perspective, she views the fact that TDM exception moved from being optional to mandatory and it is not overridable. From the negative side she stated that the exception is very limited in scope. Startups or SMEs do not fall under this exception. Thus, Europe risks to lose promising researchers to other parts of the world.
This statement was also supported by Romy Sigl (AustrianStartups). She confirmed that anybody can created a startup today, but if startups are not supported by legislation, they move outwards to another country where more potential is foreseen.
The right to read is the right to mine
The next panel was devoted to an overview of FutureTDM case studies: Startups to Multinationals. Freyja van den Boom (OKI) gave on overview of the highlights of the stakeholder consultations, which cover different areas and stakeholder groups within TDM domain. Peter Murray-Rust (ContentMine) presented a researcher’s view and he stressed that the right to read is to right to mine, but we have no legal certainty what a researcher is allowed to do and what not.
Petr Knoth from CORE added that he believed that we needed the data infrastructure to support TDM. Data scientist are very busy with cleaning the data and they have little time to do the real mining. He added that the infrastructure should not be operated by the publishers but they should provide support.
Donat Agosti from PLAZI focused on how you can make the data accessible so that everybody can use it. He mentioned the case of PLAZI repository – TreatmentBank. It is open and extracts each article and creates citable data. Once you have the data you can disseminate it.
Kim Nilsson from PIVIGO spoke about the support for academics – they have already worked with 70 companies and provided support in TDM for 400 PhD academics. She mentioned how important data analytics and the possibility to see all the connections and correlations are for example for the medical sector. She stressed that data analytics is also extremely important for startups – gaining the access is critical for them.
Data science is the new IT
The next panel was devoted to Universities, TDM and the need for strategic thinking on educating researchers. FutureTDM project officer Kiera McNeice (British Library) gave an overview on the skills and education barriers to TDM. She stressed that there are many people saying that they need to have quite a lot of knowledge to use TDM and that there are skills gap between academia and industry. Also, the barriers to enter are still high because use of the TDM tools often require programming knowledge.
We have put together a series of guidelines to help stakeholders overcome the barriers we have identified. Our policy guidelines include encouraging universities to support TDM through both their research and education arm for example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it. You can read more in our Baseline report of policies and barriers of TDM in Europe, or walk through them via our Knowledge Base.
Kim Nilsson from PIVIGO stressed that the main challenge are software skills. The fact is that if you can do TDM you have fantastic options: startups, healthcare, charity. Their task is to offer proper career advice, help people understand what kind of skills are appreciated and assist them to build on them.
Claire Sewell (Cambridge University Library) elaborated on the skills from the perspective of an academic librarian. What important is the basic understanding on copyright law, keeping up with technical skills and data skills. “We want to make sure that if a researcher comes into the library we are able to help him.”- she concluded.
Jonas Holm from Stockholm University Library highlighted the fact that very little strategical thinking is going on in TDM area. “We have struggled to find much strategical thinking on TDM area. Who is strategically looking for improving the uptake at the universities? We couldn’t find much around Europe” – he said.
Stefan Kasberger (ContentMine) stressed that the social part of the education is also important – meaning inclusion and diversity.
Infrastructure for Technology Implementation
The last session was dedicated to technologies and infrastructures supporting Text and Data Analytics: challenges and solutions. FutureTDM Project Officer Maria Eskevich (Radboud University) delivered a presentation on the TDM landscape with respect to infrastructure for technical implementation.
Stelios Piperidis from OpenMinTed stressed the need for an infrastructure. “Following more on what we have discussed, it looks that TDM infrastructure has to respond to 3 key questions: How can I get hold on the data that I need? How can I find the tool to mine the data? How can I deploy the work carried out?”
Mihai Lupu form Data market Austria brought up the issue of data formats: For example, there is a lot of data in csv files that people don’t know how to deal with.
Maria Gavrilidou (clarin:el) highlighted the fact that not only the formats are problem but also identifying the source of data and putting in place lawful procedures with respect to this data. Meta data is also problematic because it very often does not exist.
Nelson Silva (know-centre) focused on using proper tools for mining the data. Very often there is no particular tool that meets your needs and you have to either develop one or search for open source tools. Another challenge is the quality of the data. How much can you rely on the data and how to visualise it? And finally, how to be sure that the people will have the right message.
The closing session was conducted by Kiera McNeice (British Library), who presented A Roadmap to promoting greater uptake of Data Analytics in Europe. Finally, we also had a Demo Session with flash presentations by:
- Stefan Kasberger (ContentMine),
- Donat Agosti (PLAZI), Petr Knoth (CORE),
- John Thompson-Ralf Klinkenberg (Rapidminer),
- Maria Gavrilidou (clarin:el),
- Alessio Palmero Aprosio (ALCIDE)
You can find all FutureTDM reports in our Knowledge Library, or visit our Knowledge Base: a structured collection of resources on Text and Data Mining (TDM) that has been gathered throughout the FutureTDM project.