Let there be data: Exploring data as a public good
28 Nov 2019 16:40h - 18:10h
[Read more session reports and updates from the 14th Internet Governance Forum]
Both the availability of training data and AI-based solutions can play a major role in addressing current inequalities regarding access to knowledge, services, and the diversity of cultural expressions. Today’s frameworks treat data only as a commodity and ignore its potential for closing the global digital gap. Future discussion should focus on democratising data as an important global infrastructure. They should also consider governance models for data commons based on the multistakeholder approach.
Data is a critical tool for social and economic development, but remains deeply problematic. For example, datasets used to train artificial intelligence (AI) algorithms exclude data from the Global South which makes new technologies even less apt to improve global digital inclusion. Ms Lea Gimpel (Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ)) said that to harness technologies and data potential, we firstly need to close the data gap and gather data from as many different communities as possible. This will show the needs of those currently excluded. Secondly, we need to open up data and allow small entrepreneurs to innovate and train their locally built solutions. Thirdly, we need to ensure that data is not biased. Ms Renata Ávila (Smart Citizen Foundation) added that once data was seen as a cultural good, whereas today it has been framed as a commodity for profit. A profound cultural shift is needed to take back ownership over data as a common product and a common infrastructure to build projects on, locally, regionally, or nationally.
Online languages and interaction need to be available in as many languages as possible to be able to serve the Global South. Mr Alex Klepel (Open Innovation, Mozilla) explained that open voice and speech technology is a crucial access point to information and accessible to everyone. The major tech companies own most of the available data and use it to improve their products, while keeping smaller actors excluded. A good counteract is building public-private partnerships such as the one between Mozilla, GIZ, and local innovators in Africa that are running together a voice crowd-sourcing initiative. Mr Audace Niyonkuru (Digital Umuganda) said that the wealth of speech data allows the local entrepreneurs to produce solutions on the ground. This is particularly valuable for Africa since in their cultures spoken interaction is preferred. Ms Baratang Miya (Uhuru Spaces) warned about data biases. Data commons could give voice to marginalised groups such as women.
Developing voice recognition technology is also important for preserving the global linguistic diversity as 40% of languages are in danger, according to Ms Irmgarda Kasinskaite-Buddeberg (UNESCO). Under resourced and under-represented languages are economically unprofitable for companies to document, but we as a people should be interested in preserving the heritage of cultural linguistic diversity, traditional knowledge, and discovering new traditional practices. We need to better use technology to reflect our global communities, and data as a common good can do that.
The roundtable asked if data should then be a commodity at all or should we introduce a type of ‘data socialism’? If researchers interact with indigenous communities and collect data, it is communal data. But once the raw data becomes a new dataset to be exploited for any purpose, who owns it then? Mr Kyung-Sin Park (Open Net Korea) proposed to overcome disconnect between the user and its data through new data ownership. New datasets should not be a luxury, but a service for the community. Park advocated that making some of our personal data communal makes sense because we are all social beings and, through interaction, our personal data inherently also becomes data. Kasinkaite-Buddeberg emphasised that not all open or public data means the same for each community, so context-based discussions on privacy and intellectual property have to be present.
If data becomes a common good, new data governance models will be needed. Everyone agreed that any governance model should be multistakeholder based, owned by the communities who would have to decide what their goals, purposes, expected outcomes, and structures are. Kasinkaite-Buddeberg proposed to focus on seeing data not just as a commodity, but as a value for humanity. A proposal was made to make data free of charge for communities and non-commercial use, but have fees and licenses for companies to use community data.
There are many calls for open data, but future debates should focus on understanding how to really incentivise data sharing, as well as on how to build skills for using and maintaining open databases.
By Jana Misic
Internet Governance Forum 2019
25 Nov 2019 16:30h - 29 Nov 2019 16:30h