Awareness Sheets

Awareness Sheets

Awareness sheets are one of the major dissemination materials of the FutureTDM, ranging from different themes such as Stories, Projects, Organisations, Challenges, Tools…etc. Throughout the project, we publish such materials to raise awareness when we come across to an outcome which may be of high interest to the relevant field, specifically, to the people who work in text and data mining, big data and data analytics. Our awareness sheets are created from our expert reports, expert interviews and discussions through our Knowledge Café events and they cover a range of factors that have an impact on TDM uptake. Keep an eye on our awareness sheets, stay up-to-date!

Awareness Sheet Collection


Outcomes of the FutureTDM Project

What have we learned? Text and data mining – using algorithms to analyse content in ways that would be impossible for humans – is shaping up to be` a vital research tool of the 21st century. But Europe lags behind other parts of the world in adopting these new technologies. ..Read More

Uptake of TDM
The economic impact of TDM in Europe could rise to as much as USD 110 billion by the year 2020.

ContentMine: Zika Tutorial

Pandemics are a very important and critical field of research, where fast and open access to the scientific knowledge is a matter of life and deatg. In crisis situations, where every hour matters, openly available software and publications lowers the barrier for collaboration and contribution. This is especially crucial for poorer countries, who are often the ones affected by diseases. ..Read More


This tutorial introduces researchers and others interested in the Zika virus to basic and advanced text and data mining methods  with the ContentMine toolchain.


Barriers: Academic Researchers Experience

Throughout the FTDM project we have examined a variety of case studies to find out which are the barriers that academic researchers experience while practicing TDM. The researchers which were interviewed were not given a definition of “barriers” beforehand but were let unbiased to highlight all issues hindering their work on TDM. ..Read More

When researchers want to share their data, they are not sure what license to use; the fear of losing control of their data leads to not sharing it or use restrictive licenses.


TDM Spotlight - TDM and Gender Issues

Technology has always been gendered; it is primarily associated with the concept of “masculinity” because men are frequently found in positions where they can decide upon, design and develop technologies serving primarily their needs. Gender (which for the purpose of this presentation is restricted to male vs female, leaving out the rest of the spectrum of gender fluidity) affects the design of technology which in turn strengthens or weakens gender stereotypes.  ..Read More

Even now when more women than ever before are hired to lead organisations and more men use their right to parental leave the effects of gender stereotyping cannot be underestimated. One still observes the persistence of stereotypical gender perceptions across different generations, countries of the European Union and fields of life.


CORE: Aggregating the world's open access reserach papers

CORE is a global large-scale Open Access aggregation platform that offers access to a large volume of free and open access content. Its aim is to aggregate all open access research outputs from repositories and journals worldwide and make them available to the public. In this way, CORE facilitates free unrestricted access to research for all. ..Read More

The CORE system harvests metadata records and the associated full-text content from Open Access repositories and journals listed in CORE.


Text and Data Mining as an Economic Asset

What is TDM? TDM facilitates the extraction of useful and instrumental pieces of information from typically large corpora of essentially unstructured text and other types of data; it also allows for the translation of this information into actionable intelligence for advancing a specific process – be it public policy intervention, market actions or actions performed by other entities for various reasons.. ..Read More

Big and open data in Europe
It has been estimated that Big and Open Data will give an incremental boost of 1.9% to European economic growth by 2020.

Buchholtz, S., Bukowski, M., & Sniegocki, A. (2014)


KCONNECT: Search technologies for medical information

Radiologists are drowning in images. At larger hospitals more than 100.000 images (over 100 Gb) are produced per day. What is Kconnect? Radiologists and other clinicians are facing an information overload caused by an increasing number of images and an increasing complexity of radiological protocols.. ..Read More

As a researcher you can even build your own pipeline to combine and process medical texts!

Microsoft: The International Perspective

Who? Microsoft. Best known for the Windows operating system and Microsoft Office suite of productivity software products like Word, Excel, PowerPoint, etc. Less known are its many activities in the area of data analytics, artificial intelligence and TDM. Microsoft develops products and services that allow people to analyse text, sound, images and data in many different ways. ..Read More


Organisations and businesses must effectively handle all aspects of data – managing diverse datasets, processing it, storing it, analysing etc. Microsoft Azure underpins many of our data services in this area.


PaperHive: A start-up perspective

What does PaperHive do? PaperHive is a web-platform for collaborative reading that was created in 2016 by Dr. André Gaul and Alexander Naydenov. It allows researchers to engage in collaborative reading which makes reading more effective and effcient. Researchers can easily discover, share and annotate content from different content providers. PaperHive is part of the startup incubator of the Centre for Entrepreneurship at TU Berlin. ..Read More

Malcolm Macleod

Dr. André Gaul, CEO PaperHive

Publishers should clearly indicate what can be done with the content and not create individual and home-brewed licenses.
AS Challenges Legal 2 FutureTDM

Legal Barriers and Recommendations

The EU is modernising copyright rules so that they better fit the digital age. In September 2016, the European Commission presented legislative proposals to this end. These are now being discussed by the European Parliament and the Council to ensure that both users and creators can make the most of the digital world. One of the proposal objectives is to review and reform rules applicable to education, research and culture. ..Read More


In highlighting the EU level aspects of the FutureTDM policy framework report, we hope to provide a very timely and focused overview of recommendations relevant to the Brussels copyright reform debate.


OpenMinTeD: Open Mining INfrastructure for TExt and Data

What is OpenMinTeD? OpenMinTeD is an ongoing Horizon 2020 project (01/06/2015 – 31/05/2018) which aims to create an open, service oriented, infrastructure for text and data mining (TDM) of scientific and scholarly content. How is OpenMinTeD related to FutureTDM? FutureTDM is defining the challenges of TDM and proposing a legal and  ..Read More

Digital research data is being produced in massive amounts and it cannot be harnessed or utilised unless appropriate technologies exist.

The Plazi Approach

Plazi is a non-profit organization based in Bern, Switzerland. It aims to support and promote the development of persistent and openly accessible digital taxonomic literature, in other words, Plazi aims to improve the science of charting the world’s biological diversity. Plazi maintains a digital taxonomic literature repository and participates in the development of new models for publishing taxonomic treatments ..Read More

Essential Biodiversity Variables should be shared as open data, making them available without charge or restrictions on reuse.

TDM Spotlight: The Start-Up Perspective

Who? Mediately – a Slovenian start-up since 2011. Its websites and apps are used by over 35,000 doctors, nurses and other medical professionals in Slovenia, the Czech Republic, Slovakia, Serbia and Croatia. Over half of Slovenia’s doctors use it daily. What’s got these doctors so excited? Mediately tells them everything they need to know  ..Read More

Blaž Triglav

CEO, Mediately

There is an enormous amount of quality medical information produced by public institutions in the EU. Enabling access to that data, by publishing it in open formats, without copyright restrictions, can be a huge driver of innovation for healthcare companies all over the EU.

TDM Spotlight: The University Perspective

Who? A research team at the University of Edinburgh – the sixth oldest university in the English-speaking world. They’re using TDM to keep researchers updated on developments in their domain, by quickly identifying high-quality, relevant results from the millions of publications produced each year. What kind of numbers are we talking about exactly? In the healthcare sector, 1.3 million…Read More

Malcolm Macleod

Professor of Neurology and Translational Neuroscience, University of Edinburgh

As we develop tools for TDM, it is absolutely essential that we have access to all relevant publications.

TDM and ContentMine - Barriers and Enablers of TDM

What is ContentMine? ContentMine is a non-profit organisation founded by Dr Peter Murray-Rust, a chemist, molecular informatician and advocate for open science. Murray-Rust faced barriers throughout his career in trying to apply his TDM technologies to the scientific literature. ..Read More

The days of manually searching through thousands of academic papers are now gone

Impact of Text and Data Mining on European Economic Growth

How does Text and Data Mining Impact Economy? Text and Data Mining (TDM) is closely related to the concept of Big and/or Open Data. To put it in simple words – the activity of mining text and data allows us to translate the vast, unstructured, and unstoppable flow of data into useful pieces of information. …Read More

Projekt: Polska Foundation
In order to realise the promise of Big and/or Open Data as a means to influence Europe’s economy, we need to guarantee both accessibility and quality of TDM activity in the EU.
AS Tools FutureTDM

Techniques, Tools and Technologies for TDM in Europe

Data comes in a variety of forms i.e. text, audio, video, images, graphs, numbers, chemical compounds, likes, etc. and can be presented in a structured, unstructured or semi-structured way. Depending on the data medium and structure, different techniques are being deployed to extract information and knowledge. …Read More

ATHENA Research and Innovation Center

TDM tools and technologies can either be language-independent or dependent. However, not all languages are supported equally.

AS Challenges Legal FutureTDM

Legal Issues of TDM: The Relevant Questions

Legal issues? Believe it or not, text and data mining (TDM) can be an unlawful activity. As a miner, you will not always identify them as such, but TDM may be restricted under copyright law, database law and data protection law. How? TDM activities often involve the making of (temporary) copies, either on a permanent storage …Read More

Marco Caspers - Institute for Information Law, UvA

Currently, an exception for TDM under copyright law is discussed on a European level. Such an exception will make copyright law less restrictive for TDM carried out under certain circumstances.

AS Projects FutureTDM

Research Projects & Infrastructures for TDM in Europe

Is European research interested in TDM? Yes it is. This is manifested both by the number of scientific articles published in journals and conferences by European scholars and the number of TDM related projects and infrastructures (in fact projects which have resulted in the development of infrastructures). These two axes are …Read More

ATHENA Research and Innovation Center

The challenge of securing the benefits of data sharing without compromising the rights of all parties involved needs an urgent answer.


TDM Spotlight: The Space Industry Perspective

Where? European Space Agency Research and Technology Centre. What are we talking about? The Mars Express Power Challenge and the role of competitions in developing data analytics. Who’s talking? Helen from FutureTDM met with Advanced Concepts Team members Dario Izzo and Jorg Muller. … Read More

Dario Izzo

Advanced Concepts Team Scientific Coordinator at ESA

Making data available and accessible means we can use TDM to catalyse scientific discovery.