Knowledge Café

Knowledge Café

FutureTDM Workshop II

A reminder of the project

Text and data mining (TDM), is the process of deriving information from machine-read material. It is sometimes referred to as data analytics, content mining, big data, machine learning or knowledge discovery.

TDM is important because using this machine technology to sift through big data to discover hitherto unidentified information, could help us to further our understanding of the arts, business, science and beyond. This provides an opportunity for economic, scientific and cultural growth. Yet in the EU, we aren’t doing as much of this text and data mining as in other parts of the world perhaps for legal, educational or technical reasons. The Commission funded FutureTDM project aims to identify and reduce these barriers that inhibit the uptake of TDM within Europe. For the past 18 months, we’ve been speaking to the TDM community to get their views.

Dr Ayris explaining the purpose of the workshop
Dr Ayris explaining the purpose of the workshop
FutureTDM Workshop II at the European Parliament
FutureTDM Workshop II at the European Parliament
Presentations from the Workshop are available at SlideShare
Presentations from the Workshop are available at SlideShare
Find out more in our awareness sheet collection
Find out more in our awareness sheet collection
Dr Kiera McNeice FutureTDM Project Officer at the British Library
Dr Kiera McNeice FutureTDM Project Officer at the British Library
Lenard Koschwitz, Director European Affairs at Allied for Startups
Lenard Koschwitz, Director European Affairs at Allied for Startups
Sophie Aubin from INRA explains OpenMinTeD
Sophie Aubin from INRA explains OpenMinTeD
FutureTDM Tree of Digital Knowledge
FutureTDM Tree of Digital Knowledge
European Parliament, Brussels

 29 March 2017, 14:00-17:00

On 29 March, FutureTDM held our second workshop on improving the uptake of EU text and data mining (TDM) in Brussels.

Kindly hosted by leading copyright MEP Catherine Stihler at the European Parliament, the event was moderated by Dr Paul Ayris, Co-Chair of the League of European Research Universities (LERU) INFO. Although LERU are not a partner in the project, the issue is of TDM is of great relevance to LERU’s members and we were very pleased to be working together for this event.

Dr Ayris explained the purpose of the workshop – to hear more from the Commission funded FutureTDM project which is looking at ways to improve uptake of TDM in the EU. The project chose to hold the workshop at the heart of the EU in order to reach as many stakeholder groups as possible active in the field of TDM and policy – and to hear their views. The workshop timing was no coincidence. Text and data mining is currently a very hot topic at the European Parliament with the copyright reform process ongoing.

The workshop was split into 2 sessions – the first, a formal panel looking at the what the project has been doing project and with a focus on its two most recent reports on economics of TDM and policy recommendations.  The following project partners were our speakers:

  • FutureTDM, the project and next steps
    Melanie Imming, Projects Manager, the Association of European Research Libraries
  • The Economic Opportunity
    Jan Strycharz, Economic Analyst, Foundation Project: Poland, Digital Center
  • FutureTDM’s Overarching Policy Recommendations
    Marco Caspers, Research Associate at the Institute for Information Law, University of Amsterdam

Their presentation slides are available here.
Reports on their work are in our Knowledge Library here.

In Session 2, we held a stakeholder round-table, where the issues from session one were discussed and the project outlined how we will be developing guidelines for increasing TDM uptake.

Dr Kiera McNeice FutureTDM Project Officer at the British Library explained that in our next project phase we will be taking what we have learned and trying to find proactive suggestions and solutions to support greater uptake of TDM. Our next action will be putting together a series of guidelines to help stakeholders overcome the barriers we’ve identified. These guidelines will be disseminated and promoted in a variety of ways – the main deliverables will be hosted on the FutureTDM website, but key sections will also be summarised and promoted as separate awareness sheets, through blog posts and our knowledge base.

The project has already highlighted some areas where we feel that TDM uptake could be helped through stakeholder guidelines and they are illustrated in our tree of digital knowledge. The idea of these guidelines will be to give concrete, actionable advice to fill some of the gaps we’ve identified in support for TDM.The guidelines will be developed as part of an ongoing process, with as much iterative feedback as we can get our hands on.

The guidelines so far:

  1. Legal issues: We hope to help address some of the legal uncertainty by providing clear, accessible information about what people need to be aware of when carrying out TDM
    • So for example if you’re a startup without access to expert legal advice, you can quickly educate yourselves in the fundamentals, and get an idea of whether you’re exposing yourself to any risk
  1.  Licensing: We plan to give clear explanations of what licences actually allow (including open CC licences), and what people can reasonably expect to be included in a bespoke licence
    • So that if you are for example a university offered a licence by a publisher, you can understand whether that licence will allow your researchers to carry out the TDM work they would like to do
  1. Data management: To educate people on the different technical requirements that apply to bulk access to content for the purpose of TDM, as opposed to individual access to content, and how people can make their data better available for TDM
    • For example if you create or store content in a repository, what metadata should you be including so that your content is genuinely re-usable for TDM?
  1. Policy guidelines for universities: To encourage universities to support TDM through both their research and education arms
    • For example by helping university senior management understand the needs of researchers around TDM, and potential benefits of supporting it
  1. Examples to help broaden the TDM user-base, by demonstrating applications of TDM in areas that aren’t traditionally thought of as data-driven
    • These will be mini case studies of a variety of different TDM applications, promoted with awareness sheets and via our knowledge base online

Reflecting on these issues, our Chair brought up the issue of awareness and preparedness among researchers in making their data ready for TDM. A recent UCL survey found that many researchers don’t think about research data management early enough (or at all) in the process of their work. Often data is still stored on paper and a significant number are not aware of what TDM is. So we are at the absolute beginning of academic appreciation of the power of TDM and the potential scientific and economic benefits – this will need to be addressed.

We then heard from three members of the project’s Expert Advisory Board:

Damir Filipovic, Director Digital Enterprise and Consumer Policy at DIGITALEUROPE highlighted that for global tech companies and SMEs legal and licensing issues will be of most interest. Additionally, content access is also important. For these stakeholders, the scope of planned exception should be expanded. In terms of research, collaboration is important and the private sector (who often provide technical tools) should not be left behind when it comes to reform for innovation.

Lenard Koschwitz, Director European Affairs at Allied for Startups talked about how Allied for Startups have been conducting a range of workshops meeting entrepreneurs from across the continent and that there’s an overwhelming interest in TDM. Startups are not looking for a free ride, but are willing to acquire licenses where required – but in majority of cases, the data is free and legal, so TDM should be allowed. Current wording of exception for scientific research is valuable to create a safe space, but at the same time the EC risks creating an unsafe space for those who don’t fall within the scope (such as startups), and they remain in legal uncertainty. The current formulation is even limiting universities, for example. A big motivation for many technical universities is to encourage successful spin offs – this is under threat if the exception is not widened. Startups will look for the best ecosystem for them. If the EU is not dealing with TDM properly, startups might decide to go and look for another ecosystem (outside of the EU).

Marko Grobelnik, Text Mining Researcher at the Jozef Stefan Institute emphasised that data has always been around, but now is much more accessible and available for technological improvement.  This throws up questions around legal issues of privacy and ownership and uncertainty about whether you are allowed to use data that has been accessed. In addition, there’s a huge talent gap and the needs of the market are much bigger than current HR supplies. The EU is not bad at it, but we are losing the talent we have to the ‘cool companies’ in the US. We also don’t manage to import talent from elsewhere.

To explain more about the work that our sister project OpenMinTeD carries out, Sophie Aubin from INRA gave an overview of the best practice the project are offering, including TDM tools and services. She explained how the platform will integrate existing resources facilitating digital standardisation and interoperability. This should also help provide legal clarity. You can find out more here

In the discussion that followed we had a lively debate covering issues such as:

  • The importance of recognising data protection and privacy
  • Whether the scope of the exception should encompass both text and data
  • Skills being given the same attention as technology
  • The need for more practical examples on how TDM is working
  • How to use 3rd party tools and platforms
  • competition vs copyright and what should be regulated where
  • Licenses under the exception
  • TDM for different levels of experience from young people to management
  • Developing evaluation skills
  • Encouraging investment where there is no legal certainty
  • Whether we need to change our mindset in the EU – do we need an ‘innovation exception’?

At the end of the session we invited participants to contribute to our digital tree of knowledge, writing their organisation or sector on the leaf stickers and adding them to the branch seen as most important to them. Perhaps unsurprisingly, the legal guidelines branch gathered the most leaves. Let’s see if that is also the case at our next workshop in Barcelona. We also provided feedback cards so those who didn’t get a chance to speak could still provide input.  All of this information will be absorbed when we compile our guidelines for increasing stakeholder uptake of TDM.

We would like to thank all the participants for a great event. The responses we gathered from the day will feed directly into our workshop report and our stakeholder guidelines which will be published on our website.

If you would also like to be part of the FutureTDM community and have your say, you can fill in the quick poll on our website home page, write a guest blog or tweet us @futuretdm.

Location