On Tuesday 21 June we held our Knowledge Café in Warsaw. The aim was to meet content mining stakeholders in Poland and chat with them about the barriers they face. All this while sipping coffee in Campus Warsaw operated by Google - our host. Centrum Cyfrowe (Projekt: Polska Foundation) participated in Knowledge Café in Warsaw as we are one of the partners in the FutureTDM consortium. Campus Warsaw is a space for start-ups where they can share knowledge and experience. It's a perfect place for an in-depth discussion about great ideas that might be shaping our future!
To join us at the table we invited people from various backgrounds – lawyers, university representatives, public administration officers.
So, what are the barriers that content miners face in Poland?
Access to data and data quality
According to our experts, there are two main legal obstacles in access to data in Poland. Firstly, the licences are not completely open and very often there is a lack of compatibility between licences on different databases which impedes data usage. Secondly, there is the question of data protection and anonymizing data before opening it and sharing. Many people want their data to be anonymized and this may lead to losing data quality. Generally data quality is a problem – it often comes as unstructured, unverified, in unreadable formats. The researchers mentioned an interesting fact, that US data are favored over European for testing TDM algorithms – simply because the quality is better. In Poland - especially when it comes to public data - we still have to deal with thousand of pages of machine - unreadable scans. What is more, data exchange and cooperation in data opening is still a challenge for public administration in Poland. A belief that sharing less is always better than sharing more is still predominant – a civil servant said.
Our legal experts highlighted the fact that there is a common assumption regarding online data that it is free to use while in fact very often it is not. This assumption stems form the lack of knowledge and of legal clarity on data protection. One of the problems is that those sharing data for research are not familiar with copyright issues and they are usually focused on individual use and not TDM.
TDM and business
When it comes to data gathering and sharing for commercial purposes, the companies don't want to share their success stories with TDM because of the competitive advantage. If you apply a year worth of TDM on a data set will you share it to let others create the advantage margin? – asked one of our experts.
Since February 2016 we've been running a series of Knowledge Cafés across Europe. Our first Cafe took place in Leiden, the Netherlands. Later, we invited TDM stakeholders to join us i.a in London, Berlin and Helsinki. In an informal setting we discussed the barriers that are preventing people form doing more text and data mining in the EU. This information will be included in our report (which is part of the TDM project) where, after expert analysis of the TDM landscape, policy recommendations can be made.
More on Knowledge Café Warsaw.