The FutureTDM project has been organising a series of several knowledge cafés across Europe as an informal opportunity to gather feedback on text and data mining from researchers, developers, publishers, SMEs and any other stakeholder groups working in the field. In this blog you can find a summary of the input we've gathered in the last months.
Kick-off (February 2016)
The first two Knowledge Cafés took place in the Netherlands, in collaboration with the Leiden Centre of Data Science in February and during the Dutch Presidency Open Science Conference in Amsterdam in April. In Leiden, the grass roots of TDM (researchers from a variety of fields with a TDM understanding and involvement) gave the first useful inputs on barriers that hinder the wider uptake of TDM in Europe, such as strict copyright laws and a lack of available raw data. In Amsterdam, politicians, officials and stakeholders from all over Europe were challenged to reflect on controversial statements such as 'TDM is only of value to the hard sciences, not to humanities’, which brought out a broad spectrum of insights.
London Book Fair (April 2016)
This event attracted mainly people from the publishing community, including representatives from one of the largest society publishers and smaller academic and open access publishers. Following a fruitful discussion focused on specific issues for publishers, participants said that, when given unlimited resources, they would invest in infrastructure and in having the right to access datasets. They further discussed how there is so much hype about TDM but not many achievements or specific outcomes to point to. It was agreed that there is a need for examples and case studies to show what is actually achieved through the use of text- and datamining. What is happening in the US could be useful in this regard.
LREC Slovenia (May 2016)
At the International Conference on Language Resources and Evaluation, the Knowledge Café was very well attended by mainly researchers.Given the focus of the conference, it is not surprising that a lot of the discussions centered on barriers relevant to the NLP community. They mentioned how they find it difficult to know where to go with their questions about TDM within their institutions. Another need that was expressed was the need for standards, as well as for having incentives for the community to adopt and use such standards. Finally, participants agreed that a horizontal infrastructure with data and tools for all languages is needed. These TDM infrastructures should also be accessible to companies, so that the entry barrier to TDM is lowered by making available the necessary basic language and TM processing tools.
Berlin Buzzwords (June 2016)
At this conference on storing, processing and searchability of large amounts of digital data, the audience was predominantly made up of people working in tech companies, next to researchers from universities and research institutions. Several barriers for the uptake of TDM were discussed, such as the exact legal situation around TDM often being unclear or confusing to scientists, as well being different per country. Another hindrance is that lawyers sometimes lack specific IT knowledge: bridging such gaps and broadening the understanding between the different fields would greatly contribute to creating a proper knowledge society. Similar to LREC, participants talked about the specific tools and parsers needed for the different European languages, and the need for more standards was also expressed.
If you want to have your say on the future of text and data mining in Europe, there are three more Knowledge Cafés coming up soon: on 21 June on Campus Warsaw, a joint workshop with OpenMinTeD knowledge entitled “The Future is all Mine!” on 29 June during the LIBER conference in Helsinki, and a Knowledge Café in Brussels in September 2016. You can view the full reports of all Knowledge Cafés on this page.