In my earlier blog post, I announced that FutureTDM is mapping the legal barriers to TDM in Europe, for which we conducted a questionnaire among legal experts in different Member States. We ended up with sixteen national reports that covered fifteen EU Member States and one EER country (Norway). We have recently submitted Deliverable D3.3, titled “Baseline report of policies and barriers of TDM in Europe”, which is the result of this exercise. It does not only present our analysis on the legal barriers to text and data mining in the EU, but it also addresses stakeholder policies that deal with the regulations identified as barriers.

Legal barriers

We identified three legal regimes as potentially impeding TDM. The first two are categorized as intellectual property (IP) regimes and include copyright and database law. The third concerns data protection law. We ‘tested’ the different rules of these regimes against a benchmark that included three focal points:

  • Restrictiveness: do the rules restrict activities necessary to carry out TDM?
  • Fragmentation: is the national implementation of rules too fragmented and therefore hamper TDM on cross-border level?
  • Uncertainty: do the scope of the rules lack clarity, rendering the lawfulness of TDM uncertain and thereby impede TDM activities?

For the IP regimes, we found that – as a main rule – the right holders of databases and the works (such as articles, images, music and books) contained therein have the exclusive control on any copies, or publications or distribution, made thereof. As a result, a TDM user who seeks to mine, for example, a news website or scientific publisher’s database, needs the permission of the respective rightholder(s). Exceptions to those main rules of copyright and database law exist, of which most potential is found in the exception for reproductions made for non-commercial research purposes, but we have found the national implementation thereof to be either:

  • Too restrictive, not allowing the necessary activities to perform TDM;
  • Too fragmented, resulting in a wide variety in scope of the exception making TDM users lost when dealing with different national laws.
  • Too uncertain in scope and meaning, rendering the exception insufficient to rely on to lawfully carry out TDM.

These findings largely apply to data protection law. When a TDM user mines personal data – such as names, (IP) addresses, genetic data, and any other data enabling to identify an individual – he or she must comply with data protection law rules and obligations, but the user may not always be aware of these rules, let alone being certain how to comply with them. Anonymising – meaning that the data is not considered ‘personal’ any longer – is often regarded as a solution for this, but a miner may not always be certain whether the data to be mined is indeed ‘anonymous’: while one party may not be able to identify individuals with a given dataset, another actually might, for example, due to aggregation with other data available to him. Especially in an online context, data is much more likely to become personal data.

Stakeholder policies

The extent to which legal regulations impede TDM may depend on how they are dealt with in stakeholder policies. For IP regimes, it is relevant to what extent the right holders are using their exclusive rights to prohibit or allow mining. In the context of the mining of scientific publications, we found that Open Access (OA) licenses can be an important enabler of TDM, since they allow anyone to use or re-use materials published under such licenses. The only potential restriction in this regard is that the licenses impose a non-commercial requirement, which would exclude the mining of such publications for commercial purposes. We identified several trends in promoting Open Access to publications and thereby implicitly promoting the uptake of TDM:

  • A growing number of OA publishers (also, some non-OA publisher permit TDM on their publications by academics as a standard policy)
  • Research funders requiring publications to be published under OA licenses
  • Research institutions and libraries advocating and assisting with OA publishing

However, where OA to publications is reaching maturity in the field, OA to the underlying research data is still in its infancy. We see that the importance thereof is underlined by many stakeholders, but any obligations or standard policies are lacking as OA research data faces several challenges:

  • Research data may contain personal data or confidential information and it may therefore be problematic to make it freely available to the public
  • There may be no appropriate infrastructure to publish and retain the research data

Next steps

To see all our findings and conclusions, please have look at the deliverable1. On the basis of these findings, we will make recommendations for a new policy framework. Coming soon!

1 Deliverable 3.3 Baseline report of policies, and barriers of TDM in Europe



or login with: