Here we discuss the various intellectual property (IP) laws that can be relevant to the use of content for TDM activities. In some cases, exceptions to these laws apply, making it possible to re-use others’ content for TDM without needing specific permission from rights holders. In other cases, however, the law does not allow TDM without permission from rights holders. And even in cases where legal exceptions allow for TDM without explicit permission from rights holders, there may be scope for some aspects of a licence or contract to limit how you may use content for TDM. In these guidelines, we will discuss what you should be aware of when licensing content for use in TDM.
Guidelines for Content Licensees
This is a question you should consider in the planning stages of any TDM project. Section 2.3.3 sets out a step-by-step plan to minimise legal risk in a TDM project, including deciding whether or not you need a licence for your planned activities.
Even in cases where legal exceptions to IP rights apply, they may not actually allow all the TDM activities you plan to carry out. Exceptions can be limited if they are overridable by contract, or if they have caveats that leave scope for the rights holder to define other limitations.
Copying content In the UK, a specific exception from copyright applies if you are copying content for the purpose of non-commercial TDM research. This exception also includes a clause that ensures it cannot be overridden by any contract.1 Without such a clause, any future exceptions adopted by the EU or its member states could be overridden by the terms of licences agreed with rights holders.
Publishing content Various legal exceptions may allow you to reproduce and publish parts of the original contents used in your TDM analysis, alongside the results of that analysis. You can find a detailed discussion of these exceptions in the FutureTDM report on policies and barriers of TDM in Europe.2
If you have determined that you are dealing with protected content, you should establish what you are going to do with that content, and verify whether this is something that needs the consent of the IP rights holders. This is generally the case when you copy, either permanently or temporarily, or publish those contents in whole or in part.
Copying content: TDM activities usually involve making copies of content or (parts of) databases, ranging from retrieving copies from one or more sources, to transforming the contents into a (formalised) dataset that will be loaded into the computer’s working memory when performing TDM analysis.
Publishing content: If you are planning to share or disseminate any of your TDM results, or the underlying data or content sources, this is likely to be considered “publishing” – and will need permission in most instances as it related to the exclusive rights of the rights holder to control the communication or redistribution of their content.
These acts need to be authorised by rights holders, unless special exceptions apply. Despite the existence of common European rules on copyrights and database rights, the applicability of and scope of these exceptions vary significantly across national borders. This means that if you work in multiple countries or collaborate with foreign colleagues, you will need to assess any relevant exceptions for each country you are operating in. 1
You may already have a licence or agreement with the rights holder whose content you would like to use for TDM. If so, it is important to understand what this licence permits. While these guidelines cannot cover every possible licence clause you may encounter, the below examples should give an idea of the sorts of restrictions or freedoms a licence may specify. If you are unsure whether your licence allows your intended TDM activities, we recommend contacting the rights holder directly or getting expert legal advice.
In some cases, rights holders will explicitly address TDM in their licences, and the limitations under which TDM is permissible. Unless a legal exception applies and cannot be overridden by contract, you must abide by the rights holder’s conditions when carrying out TDM on their content.
As discussed above, even if an exception applies and cannot be overridden by contract, the rights holder may still have some limited ability to define the ways in which you may use their content for TDM. This may include applying reasonable technical protection measures to protect their content and system, or limiting the types or number of authorised users who have access to the content. TDM often requires the use of large corpora of content from multiple sources, and some rights holders may impose restrictions on how much content you can access within a given timeframe to avoid overloading their servers.
In some cases rights holders take an explicitly permissive stance towards TDM, supporting TDM activities in principle as well as in practice.
In some cases, the wording of a licence may make it difficult to determine what kinds of TDM activities are permitted. In Elsevier’s Frequently Asked Questions page about text and data mining,13 for example, one response states, “You are free to commercialize your own findings,” while another states, “we provide TDM access for non-commercial purposes.” If you are unsure about the specific details of your licence, you should contact the licensor to clarify what they do and do not permit.
Although data mining began to emerge as a technology in the late 1980s,4 many rights holders have been slow to adapt to this technology and do not yet explicitly address TDM in their licences or other policies. There are also some kinds of content for which licences are not routinely provided at all, for example the content of web pages. In cases where no legal exceptions apply, this makes it difficult to assess whether your intended use of content for TDM is permitted.
If you are unsure what licensing terms apply to a given set of content, either because the licence is unclear or because you cannot find one, the safest option is always to contact the rights holder to ask permission to access and use their content. Section 3.4 below discusses some considerations you may want to take into account to negotiate reasonable and proportionate licences with rights holders.
Limitations to quotation
“Publications or analyses resulting from TDM of subscribed content may include quotations from the original text of up to 200 characters, or 20 words, or 1 complete sentence.”
Licence restrictions on TDM
“The user may not … use any robots, spiders or other automated downloading programs, algorithms or devices to search, screen-scrape, extract, or index any Elsevier web site or web application”
For an individual researcher or small business, negotiating an appropriate licence with each and every rights holder whose content you may wish to use for TDM can be a prohibitive drain on resources. Unfortunately given the lack of consistent, unambiguous legal exceptions across EU member states, you may find this is the only way to ensure your TDM activities are legal.
Rights holders may want to apply different permissions to their content depending on whether it is used for “commercial” or “non-commercial” purposes. They may wish to restrict activities that involve accessing/copying their content for analysis, reproducing excerpts of their content after analysis, or both. You should consider whether this distinction is reasonable or practical.
The activities of for-profit industry and businesses will of course generally be considered commercial. Conversely for some academic researchers, it may be clear that their use of TDM is purely non-commercial. But in cases where researchers are partly funded by industry, or collaborate with commercial partners, or are developing new technologies that may become the foundation of spin-out companies, it is less clear where the distinction between commercial and non-commercial research lies. Especially in research, where the potential applications of new knowledge or technologies may not be known at the beginning of a TDM project, restricting TDM to “non-commercial” purposes may have unforeseen impacts.
Rights holders may wish to monitor, to some degree, bulk access to their content for TDM purposes. Reasons for this may include ensuring that they can identify fraudulent or malicious access to content, as well as understanding the needs and behaviour of licensees to develop and provide new products and services.
You should consider whether the nature and extent of monitoring of access to content is reasonable and appropriate. Particularly in a research context, overly detailed monitoring of researchers’ behaviour may raise ethical questions about academic freedoms.
Rights holders have a need to protect their intellectual property from redistribution that would impact the value of the original works, and may therefore restrict how much of their content may be re-published following TDM analysis, in the form of quotations or other excerpts. They may also require TDM practitioners to attribute appropriate credit to the rights holder, in cases where excerpts of original content are reproduced.
You should consider whether restrictions on reproducing excerpts of original content are reasonable and appropriate. It may be useful to consider whether the intended reproductions are likely to impact the value of the original works.
Rights holders may wish to apply technical protection measures to protect their content. These may be to ensure that only authorised users access their content, or to prevent systems and servers from being overloaded by large-scale access to their content.
You should consider the impact of technical protection measures on TDM users, particularly in the context of the large-scale access to content that TDM typically requires.
Until and unless the EU and its member states adopt consistent exceptions to intellectual property rights for the purposes of TDM, licences remain a key consideration for anyone planning to carry out TDM. Some key points to remember are: • There are several kinds of legal restrictions that apply to TDM, beyond licensing; check section 2 to make sure you understand these as well. • Even when an exception to IP rights applies, licence terms may affect your ability to carry out TDM; make sure you understand any relevant terms in your licence. • If it is not possible to find a licence or identify the rights holder for a given piece of content, TDM may not be lawful; consider carefully whether you need expert advice on risk. • If you negotiate licences on behalf of an institution, you play a key role in enabling those you represent to carry out TDM; please talk to your researchers, make sure you understand their needs, and consider whether licences are appropriate and reasonable for all parties.
“Members of subscribing institutions have our permission to mine journal content for either commercial or non-commercial purposes.”