green-bonsaiThe debate on the European Union (EU) regulation of text and data mining (TDM) has been animated by the draft reform proposed by the European Commission (EC) at the end of 2016. The main concerns have been that the regulation may hinder TDM on copyrighted material in the EU. This blog has discussed in a previous post the proposal and its limitations.

Mining copyrighted material, such as scientific publications, is certainly relevant to generate new knowledge, analysing patterns, identifying trends and correlations. Exceptions to rights on the European level should allow these activities by facilitating the use of protected intellectual content under specific circumstances.

In addition, authors could facilitate TDM by publishing content with less restrictive licenses not subject to legal clauses, specific agreements, and exceptions to EU directives. Other than for obvious political and economic reasons, this often does not happen due to the less obvious lack of public platforms and shared e-infrastructures that are able to store and retrieve information, especially scientific data. Wikipedia changed the way we share, look for and access text-based information. Authors of scientific publications can author or edit a Wikipedia page updating it with new findings and citing the original source, thus increasing its visibility. Although it is not a scientific source, Wikipedia links texts with relating content and can provide sources to further information on a specific topic, such as scientific articles. A similar platform for sharing data does not exist yet.


Many disciplines rely on access to large number of data to advance and to produce application of public use. One of those disciplines is Industrial Ecology, the study of complex linked industrial and ecological systems, which may occur over a long time and may affect geographically distant locations. Product footprints are an important use of the data from Industrial Ecology. A product footprint is a common measure of the environmental damages related to the production of a good or service during its entire life cycle, carried out through a product Life Cycle Assessment (LCA). Data availability is key for a correct product footprint. The lack of a public virtual platform, where authors could share data, has resulted in the rise of multiple private databases protected by copyright and available for a fee. BONSAI, a network of scientists and LCA practitioners, aims at building an open source e-infrastructure to share data for sustainability assessment information. As an open source database, BONSAI intends to be freely accessible by humans and machines, thus allowing TDM.

In a rapidly changing context, data are a dynamic rather than a static object. Users should not only be able to easily find, to access and reuse data but should also be able to validate and update them. Open access to product footprint information would not only benefit the scientific community, but also the forward-looking industries and consumers basing their purchase decisions on this information. In fact, publicly available and transparent data on environmental performance would facilitate the adoption of quantitative and impact-specific environmental labels. Transparent information is essential for informed consumer choices, which in turn is essential to trigger the supply of sustainable products.

The European Commission claims that the copyright review proposal aims at creating a well-functioning marketplace. We believe that such a marketplace is one that provides correct information on the product offered in the market, allowing consumers to make informed choices and industries to realise their competitive advantage. It is hard to imagine a well-functioning market place without access to product footprint data and their source as it is difficult to imagine a democracy with censored media.

As long as copyright issues restrict TDM, it becomes even more important to build and support open source platforms, e-infrastructures and virtual research environments for sharing data and hosting the collective efforts.

This blog is created by Michele De Rosa, Executive Manager at BONSAI, as an external author to the FutureTDM Platform.

// All blog posts are the personal opinion of the bloggers. For more information see FutureTDM's DISCLAIMER on how we handle the blog. //



or login with: