FutureTDM Symposium

FutureTDM Symposium

FutureTDM Symposium

A reminder of the project

Text and data mining (TDM), is the process of deriving information from machine-read material. It is sometimes referred to as data analytics, content mining, big data, machine learning or knowledge discovery.

TDM is important because using this machine technology to sift through big data to discover hitherto unidentified information, could help us to further our understanding of the arts, business, science and beyond. This provides an opportunity for economic, scientific and cultural growth. Yet in the EU, we aren’t doing as much of this text and data mining as in other parts of the world perhaps for legal, educational or technical reasons. The Commission funded FutureTDM project aims to identify and reduce these barriers that inhibit the uptake of TDM within Europe. For the past 18 months, we’ve been speaking to the TDM community to get their views.

International Data Science Conference Banner
International Data Science Conference Banner
Keynote speech of Bernhard Jaeger, giving an introduction of FutureTDM Project
Keynote speech of Bernhard Jaeger, giving an introduction of FutureTDM Project
Keynote
Dealing with the legal bumps
Infrastructure for Technology Implementation
Ben White from British Library moderates the Session: Skills and Education
Stefan Kasberger on ContentMine Use-Case tutorial
Stefan Kasberger on ContentMine Use-Case tutorial
FutureTDM Goodie Bag
FutureTDM Goodie Bag
FutureTDM Tree of Digital Knowledge
FutureTDM Tree of Digital Knowledge
Salzburg, Austria

 13 June 2017, 10:00-17:00

On 13th of June, we held our FutureTDM Symposium at the International Data Science Conference 2017, in Salzburg, Austria. The aim was to share FutureTDM’s findings and our first expert driven policy recommendations and practitioner guidelines that can help increase the uptake of TDM in Europe.

The FutureTDM track at the International Data Science Conference 2017 has started with  a speech by Bernhard Jäger form SYNYO (FTDM Consortium) who did a brief introduction to the FutureTDM Project and explained the purpose of the Symposium – bringing together policy makers and stakeholder groups to share with them FutureTDM’s findings on how to increase TDM uptake. The slides to this presentation can be found here.

This was followed by a keynote speech on the Economic Potential of Data Analytics delivered by Jan Strycharz from Fundacja Projekt Polska, member of the FutureTDM Consortium. It was estimated that automated (big) data and analytics – if developed properly – will bring over 200 B Euro to the European GDP by 2020. This means that algorithms (not to say robots) will be, then, responsible for 1.9% of the European GDP.

We would like to thank all the participants for a great event. The responses we gathered from the day will feed directly into our workshop report and our stakeholder guidelines which will be published on our website. You can read more on the TDM impact on economy here and the slides can be found here.

Dealing with the legal bumps

The plenary session with keynote speeches was followed by the first Symposium panel: Data Analytics and the Legal Landscape: Intellectual Property and Data Protection. Panelists during this session were:

  • Duncan Campbell (John Wiley & Sons, Inc.),
  • Prodromos Tsiavos (Onassis Cultural Centre/ IP Advisor),
  • Marie Timmermann (Science Europe),
  • Romy Sigl (AustrianStartups)

As an introduction to this legal session Freyja van den Boom from Open Knowledge presented our findings on the legal barriers to TDM uptake that mainly refer to type of content and applicable regime (IP or Data Protection. Having gathered evidence from the TDM community, FutureTDM has identified three types of barriers: uncertainty, fragmentation and restrictiveness and developed guidelines recommendation how to overcome them. More on our findings related to legal barriers here.  Slides form the presentation are here.

This was followed by the statements from the panelists.

Prodromos Tsiavos stressed the fact that with the recent changes in the European framework, the law faces significant issues and balancing the industrial interest is becoming necessary.  He added that in order to initiate the uptake of the industry, a different approach is certainly needed because the industry will continue with licenses arrangements.

Duncan Campbell concentrated on Copyright and IP issues. How do we deal with all the knowledge created? How does the copyright rule has influence? He spoke about EU Commission Proposal and UK TDM exception – how to make an exception work?

Marie Timmermann also focused on the TDM exception and its positive and negative sides. From the positive perspective, she views the fact that TDM exception moved from being optional to mandatory and it is not overridable. From the negative side she stated that the exception is very limited in scope. Startups or SMEs do not fall under this exception. Thus, Europe risks to lose promising researchers to other parts of the world.

The above statement was also supported by Romy Sigl from Startup initiative, Salzburg. She confirmed that anybody can created a startup today, but if startups are not supported by legislation, they move outwards to another country where more potential is foreseen.

The right to read is to right to mine

The next panel was devoted to an Overview of Future TDM case studies: Startups to Multinationals. The panelist in the session were:

  • Peter Murray-Rust (CONTENTMINE),
  • Donat Agosti (PLAZI),
  • Petr Knoth (CORE),
  • Kim Nilsson (PIVIGO)

FTDM project officers Freyja van den Boom (OKI) and Peter Murray-Rust (CONTENTMINE) gave on overview of the highlights of stakeholder consultations. TDM practices recommendation for both content, tool, service providers and practitioners were presented followed by an overview of case studies that were collected during stakeholder consultations. The aim was to cover different areas and stakeholder groups within TDM domain. Slides from the presentation are available here.

Peter Murray-Rust presented a researcher’s view and he stressed that the right to read is to right to mine, but we have no legal certainty what a researcher is allowed to do and what not.

Petr Knoth from CORE added that he believed that we needed the data infrastructure to support the TDM. Data scientist are very busy with cleaning the data and they have little time to do the real mining. He added that the infrastructure should not be operated by the publishers but they should provide support.

Donat Agosti from PLAZI focused on how you can make the data accessible so that everybody can use it. He mentioned the case of PLAZI repository – TreatmentBank. It is open and extracts each article and creates citable data. Once you have the data you can disseminate it.

Kim Nilsson from PIVIGO spoke about the support for academics – they have already worked with 70 companies and provided support in TDM for 400 PhD academics. She mentioned how important data analytics and the possibility to see all the connections and correlations are for example for the medical sector. She stressed that data analytics is also extremely important for startups –  gaining the access is critical for them.

Data science is the new IT

The next panel was devoted to Universities, TDM and the need for strategic thinking on educating researchers.

The panelists were:

  • Claire Sewell (Cambridge University Library),
  • Jonas Holm (Stockholm University Library),
  • Kim Nilsson (PIVIGO)

FutureTDM project officer Kiera McNeice (British Library) gave an overview on the skills and education barriers to TDM.  She stressed that there are many people saying that they need to have quite a lot of knowledge to use TDM and that there are skills gap between academia and industry. Also, the barriers to enter are still high because use of the TDM tools often require programming knowledge. Slides form this presentation can be found here.

In our FutureTDM project we are putting together a series of guidelines to help stakeholders overcome the barriers we’ve identified. Our policy guidelines universities include encouraging universities to support TDM through both their research and education arm for example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it. More on education and skills barriers and guidelines how to overcome them here.

Kim Nilsson from PIVIGO stressed that the main challenge are software skills. The fact is that if you can do TDM you have fantastic options: startups, healthcare, charity. Our task is to offer proper career advice, help people understand what kind of skills are appreciated and assist them to build on them.

Claire Sewell elaborated about the skills form the perspective of an academic librarian. What important is the basic understanding on copyright law, keeping up with technical skills and data skills. “We want to make sure that if a researcher comes into the library we are able to help him.”- she concluded.

Jonas Holm from Stockholm University Library highlighted the fact that very little strategical thinking is going on in TDM area. “We have struggled to find much strategical thinking on TDM area. Who is strategically looking for improving the uptake at the universities? We couldn’t find much around Europe” – he said.

Stefan Kasberger stressed that the social part of the education is also important – meaning inclusion and diversity.

Infrastructure for Technology Implementation

The last session was dedicated to technologies and infrastructures supporting Text and Data Analytics: challenges and solutions. The panelists in this session were:

  • Mihai Lupu (Data Market Austria),
  • Maria Gavrilidou (clarin:el),
  • Nelson Silva (know-centre),
  • Stelios Piperidis (OpenMinTed)

FutureTDM Project Officer Maria Eskevich (Radboud University) delivered a presentation on TDM landscape with respect to infrastructure for technical implementation.  Slides from the presentation available here.

Stelios Piperidis from OpenMindTed stressed the need for an infrastructure. “Following more on what we have discussed, it looks that TDM infrastructure has to respond to 3 key question: How can I get hold on the data that I need? How can I find the tool to mine the data? How can I deploy the work carried out?”

Mihai Lupu form Data market Austria brought up the issue of data formats: For example, there is a lot of data in csv files that people don’t know how to deal with.

Maria Gavrilidou (clarin:el) highlighted the fact that not only the formats are problem but also identifying the source of data and putting in place lawful procedures with respect to this data. Meta data is also problematic because it very often does not exist.

Nelson Silva (know-centre) focused on using proper tools for mining the data. Very often there is no particular tool that meets your needs and you have to either develop one or search for open source tools. Another challenge is the quality of the data. How much can you rely on the data and how to visualize it? And finally, how to be sure that the people will have the right message.

Roadmap

The closing session was conducted by Kiera McNeice (British Library), who presented “A Roadmap to promoting greater uptake of Data Analytics in Europe”. Slides form the presentation available here.

During the Symposium, we also had a Demo Session with Flash Presentations by:

  • Stefan Kasberger (CONTENTMINE),
  • Donat Agosti (PLAZI), Petr Knoth (CORE),
  • John Thompson-Ralf Klinkenberg (RAPIDMINER),
  • Maria Gavrilidou (clarin:el),
  • Alessio Palmero Aprosio (ALCIDE)

The collection of the demo presentations can be found here.

If you would also like to be part of the FutureTDM community and have your say, you can fill in the quick poll on our website home page, write a guest blog or tweet us @futuretdm.

Location