The 9th WIPO Conversation on Intellectual Property and Frontier Technologies – Training the Machines: Bytes, Rights and the Copyright Conundrum

13 Mar 2024 - 14 Mar 2024

Event report

Written by Anna Polo-Jansen

The 9th session of the World Intellectual Property Organization (WIPO) Conversation on Intellectual Property (IP) and Frontier Technologies – Training the Machines: Bytes, Rights, and the Copyright Conundrum – delved into the intricate relationship between artificial intelligence (AI) and intellectual property rights (IPR). Panels explored this multifaceted topic, facilitating discussions ranging from the technical underpinnings of AI to diverse national perspectives on the complex interplay between generative AI (GenAI) and intellectual property (IP).

Among the concerns addressed during the two-day event was the issue of fair compensation for content creators whose work is used in training generative models. The second and third panels of the event focused on the technical aspects of data system training. Existing national frameworks were introduced, leading to discussions on potential solutions to ensure fairness while balancing the interests of all stakeholders.

Technical background

The first panel discussions explained the inner workings of machine learning (ML) models, which underlie the AI tools we use in our daily lives. Common GenAI models on which companies build popular AI tools are usually pre-trained on a tremendous amount of data, often taken from the internet.

This training phase raises significant concerns for IPR, given that the vast amounts of input data used in training often contain copyrighted works.

As articulated by Mohan Kankanhali (Deputy Executive Chairman, AI Singapore), data used during this training directly impacts AI outputs: if the ML model is fed a limited or low-quality dataset, its predictive capabilities are severely constrained, leading to incorrect or poor-quality output. The presence of copyrighted works in these large databases can be very useful because copyrighted works represent the type of content recognised and enjoyed by many. The absence of such content would significantly reduce the capacity of GenAI models.

Main concerns related to the development of AI

The rapid advancement of AI brings about significant concerns for content creators who fear the erosion of their copyright protection. Duncan Crabtree-Ireland (National Executive Director and Chief Negotiator, SAG-AFTRA) commenting on fair compensation for creatives, noted that it is imperative to recognise the human effort behind the data exploited by AI models as the former embodies the output of human creativity and labour. Creators invest considerable time and effort into their craft, partly due to the safeguards provided by IPR.

Multiple speakers spoke of the frustration among artists who witness their life’s work used without consent and repurposed by AI. This predicament raises fundamental questions, foremost among them being the issue of consent from the artists and ensuring fair compensation for those whose work fuels ML models. Acknowledging the importance of consent is paramount, particularly concerning sensitive information like faces or voices, which carries significant individual importance. It is even more essential in the case where the content owner is not the person visible in the content. Pictures, videos, or any other content containing elements such as a voice, a face, or any information making a person identifiable should be treated with particular care. Since much of the training data consists of copyrighted works, it’s essential to ensure fair access while respecting the content owner as well as the rights of the person on display.

The European Union (EU) currently has an opt-out system in place in a directive on copyright in the digital single market, allowing content owners to opt out of their content being used in the context of data mining. This, however, is not the case in all legislations, such as those in the United States. While the European system is more protective of content owners, as mentioned by Nicola Lucchi (Professor of Comparative Law, Universitat Pompeu Fabra), on the panel on the current state of IP, it could potentially disadvantage European AI developers whose available training datasets will be more constrained than those of developers based in the United States.

Another aspect of AI and IPR revolves around defining fair compensation for creatives. Acknowledgement of an artist’s work is of great importance in the creative industry, allowing content creators to obtain public recognition and expanded opportunities. While compensation typically accompanies the use of copyrighted material in other contexts, such as the reproduction or public display of the work, artists whose creations are used as part of training datasets often receive neither credit nor remuneration. This oversight is especially frustrating to content creators when generative AI models replicate the style or elements of an artist’s work without acknowledging the original creators. Yecid Rios Pinzòn (Founding Partner, Zapata&Rios Abogados Asociados), asked whether existing norms suffice or if they need adaptation for the AI era. This depends on several factors. For one, the extent and usage of copyrighted works in training data must be examined to structure proportional compensation. Crabtree-Ireland echoed this sentiment during his panel on fair compensation for creatives. The revenue generated by AI models can also serve as a benchmark for determining fair compensation, highlighting the multifaceted nature of this discourse. It is crucial to prioritise the interests of all parties involved, balancing the needs of creatives with those of AI companies.

Furthermore, it is essential to assess the impact of AI systems on the market for original works. Whether AI models reproduce copyrighted material or incorporate it into derivative works, the potential ramifications for the original content cannot be overlooked. As mentioned by various speakers, these considerations underscore the significance of ongoing conversations surrounding AI and IPR, emphasising the need for a holistic approach that safeguards both creative integrity and technological innovation.

Potential solutions

Many national IP frameworks are currently partially addressing the concerns raised by incorporating data mining exceptions allowing for the use of copyrighted content in research contexts. However, each national framework presents its own set of specific criteria, and overall, the existing legislation often falls short of adequately addressing current concerns.

To tackle the lack of consumer trust and ethical challenges, governments must establish robust regulatory frameworks for AI development and deployment. Matt Hervey (Partner and Head of Artificial Intelligence at Gowling WLG) proposed that national frameworks should include different elements, among them data protection, cybersecurity, fairness, and accountability. By implementing clear regulations, governments can mitigate the risks associated with AI while simultaneously fostering innovation and safeguarding the rights of individuals.

Conclusion

As evidenced by the numerous lawsuits filed against entities like OpenAI, the legitimate concerns of creatives regarding the non-consensual use of their content underscore the urgency of these discussions. These legal battles are shaping new jurisprudence, putting at the forefront the real consequences creators face. It’s crucial to engage in these discussions to ensure that the interests of all parties are respected. By highlighting existing problems and proposing potential solutions, these conversations shed light on the inadequacies of current legal frameworks. Addressing issues such as consent and fair compensation for creatives without hampering AI creators necessitates extensive dialogue. Engaging in these discussions at the ninth session of the WIPO Conversation on IP and Frontier Technologies marks a significant step forward in finding equitable resolutions to these complex challenges.

Event announcement

The WIPO conversation training will address the ongoing debate faced by AI developers, who rely heavily on publicly available internet data for training, despite the inclusion of copyrighted works. On one hand, excluding copyrighted works from training data would hinder AI systems’ ability to reflect today’s society accurately. On the other hand, creatives, authors, musicians, and artists express concerns regarding the use of their works without consent or fair compensation. Training aims to explore a balance between technological advancements and protecting the rights of content creators.

This hybrid event, scheduled for March 13-14, 2024, will offer both in-person and virtual attendance options via Zoom. The event targets creators, AI developers, innovators, data scientists, tech enthusiasts, IP experts, and policymakers. The forum aims to foster meaningful discussions, share insights, and potentially identify effective strategies for addressing the copyright challenges associated with AI training.

You can find more information on programme and registration here.