Open Data Commons in the age of AI and Big Data

Text originally published by CNRS, Paris

Earlier this year, the Centre for Internet and Society, CNRS convened a panel at CPDP.ai. The panel brought together researchers and experts of digital commons to try and answer the question at the heart of the conference – to govern AI or to be governed by AI?

The panel was moderated by Alexandra Giannopoulou (Digital Freedom Fund). Invited panelists were Melanie Dulong de Rosnay (Centre Internet et Société, CNRS), Renata Avila (Open Knowledge Foundation), Yaniv Benhamou (University of Geneva) and Ramya Chandrasekhar (Centre Internet et Société, CNRS).

The common(s) thread running across all our interventions was that AI is bringing forth new types of capture, appropriation and enclosure of data that limit the realisation of its collective societal value. AI development entails new forms of data generation as well as large-scale re-use of publicly available data for training, fine-tuning and evaluating AI models. In her introduction, Alexanda referred to the MegaFace dataset – dataset created by a consortium of research institutions and commercial companies containing 3 million CC-licensed photographs sourced from Flickr. This dataset was subsequently used to train facial-recognition AI systems. She referred to how this type of re-use illustrates the new challenges for the open movement – how to encourage open sharing of data and content, while protecting privacy, artists’ rights and while preventing data extractivism.

There are also new actors in the AI supply chain, as well as new configurations between state and market actors. Non-profit actors like OpenAI are leading the charge in consuming large amounts of planetary resources as well as entrenching more data extractivism in the pursuit of different types of GenAI applications. In this context, Ramya spoke about the role of the state in the agenda for more commons-based governance of data. She noted that the state is no longer just a sanctioning authority, but also a curator of data (such as open government data which is used for training AI systems), as well as a consumer of these systems themselves. EU regulation needs to engage more with this multi-faceted role of the state.

Originally, the commons had promise of preventing capture and enclosure of shared resources by the state and by the market. The theory of the commons was applied to free software, public sector information, and creative works to encourage shared management of these resoruces.

But now, we also need to rethink how to make the commons relevant to data governance in the age of Big Data and AI. Data is most definitely a shared resource, but the ways in which value is being extracted out of data and the actors who share this value is determined by new constellations of power between the state and market actors.

Against this background, Yaniv and Melanie spoke about the role that licenses can continue to play in instilling certain values to data sharing and re-use, as well as serving as legal mechanisms for protecting privacy and intellectual property of individuals and communities in data. They presented their Open Data Commons license template. This license expands original open data licenses, to include contractual provisions relating to copyright and privacy. The license contemplates four mandatory elements (that serve as value signals):

Share-alike pledge (to ensure circularity of data in the commons)
Privacy pledge (to respect legal obligations for privacy at each downstream use),
Right to erasure (to enable individuals to exercise this right at every downstream use).
Sustainability pledge (to ensure that downstream re-uses undertake assessments of the ecological impact of their proposed data-reuse).

The license then contemplates new modular elements that each licensor can choose from – including the right to make derivatives, the right to limit use to an identified set of re-users, and the right to charge a fee for re-use where the fee is used to maintain the licensor’s data sharing infrastructure. They also discussed the need for trusted intermediaries like data trusts (drawing inspiration from Copyright Management Organisations) to steward data of multiple individuals/communities, and manage the Open Data Commons licenses.

Finally, Renata offered some useful suggestions from the perspective of civil society organisations. She spoke about the Open Data Commons license as a tool for empowering individuals and communities to share more data, but be able to exercise more control over how this data is used and for whose benefit. This license can enable the individuals and communities who are the data generators for developing AI systems to have more say in receiving the benefits of these AI systems. She also spoke about the need to think about technical interoperability and community-driven data standards. This is necessary to ensure that big players who have more economic and computational resources do not exercise disproportionate control over accessing and re-using data for development of AI, and that other smaller as well as community-based actors can also develop and deploy their own AI systems.

All panelists spoke about the urgent need to not just conceive of, but also implement viable solutions for community-based data governance that balances privacy and artists’ rights with innovation for collective benefit. The Open Data Commons license presents one such solution, which the Open Knowledge Foundation proposes to develop and disseminate further, to encourage its uptake. There is significant promise in initiatives like the Open Data Commons license to ensure inclusive data governance and sustainability. It’s now the time for action – to implement such initiatives, and work together as a community in realising the promises of data commons.

Open Data Commons in the age of AI and Big Data

Search

Tools

Community

Search

Browse by Topic

Tools

Community