US voice actors claim AI firm illegally copied their voices

Two voice actors have filed a lawsuit against AI startup Lovo in Manhattan federal court, alleging that the company illegally copied their voices for use in its AI voiceover technology without permission. Paul Skye Lehrman and Linnea Sage claim Lovo tricked them into providing voice samples under false pretences and is now selling AI versions of their voices. They seek at least $5 million in damages for the proposed class-action suit, accusing Lovo of fraud, false advertising, and violating their publicity rights.

The actors were approached via the freelance platform Fiverr for voiceover work, with Lehrman being told his voice would be used for a research project and Sage for test scripts for radio ads. However, Lehrman later discovered AI versions of his voice in YouTube videos and podcasts, while Sage found her voice in Lovo’s promotional materials. It was revealed that their Fiverr clients were actually Lovo employees, and the company was selling their voices under pseudonyms.

The mentioned lawsuit adds to the growing list of legal actions against tech companies for allegedly misusing content to train AI systems. Lehrman and Sage seek to prevent similar misuse of voices by Lovo and other companies, emphasising the need for accountability in the AI industry. Lovo has not yet responded to the allegations.

Dotdash Meredith partners with OpenAI for AI integration

Dotdash Meredith, a prominent publisher overseeing titles like People and Better Homes & Gardens, has struck a deal with OpenAI, marking a big step in integrating AI technology into the media landscape. The agreement involves utilising AI models for Dotdash Meredith’s ad-targeting product, D/Cipher, which will enhance its precision and effectiveness. Additionally, licensing content to ChatGPT, OpenAI’s language model, will expand the reach of Dotdash Meredith’s content to a wider audience, thereby increasing its visibility and influence.

Through this partnership, OpenAI will integrate content from Dotdash Meredith’s publications into ChatGPT, offering users access to a wealth of informative articles. Moreover, both entities will collaborate on developing new AI features tailored for magazine readers, indicating a forward-looking approach to enhancing reader engagement.

One key collaboration aspect involves leveraging OpenAI’s models to enhance D/Cipher, Dotdash Meredith’s ad-targeting platform. With the impending shift towards a cookie-less online environment, the publisher aims to bolster its targeting technology by employing AI, ensuring advertisers can reach their desired audience effectively.

Dotdash Meredith’s CEO, Neil Vogel, emphasised the importance of fair compensation for publishers in the AI landscape, highlighting the need for proper attribution and compensation for content usage. The stance reflects a broader industry conversation surrounding the relationship between AI platforms and content creators.

Why does it matter?

While Dotdash Meredith joins a growing list of news organisations partnering with OpenAI, not all have embraced such agreements. Some, like newspapers owned by Alden Global Capital, have pursued legal action against OpenAI and Microsoft, citing copyright infringement concerns. These concerns revolve around using their content in AI models without proper attribution or compensation. These contrasting responses underscore the complex dynamics as AI increasingly intersects with traditional media practices.

OpenAI to introduce content creator control in AI development

OpenAI has announced that it’s developing a tool to enhance ethical content usage for AI development. The tool, called Media Manager, allows content creators to specify the use of their work in AI training, aligning with the digital rights movement and addressing long-standing issues around content usage in a context where it faces a growing number of copyright infringement lawsuits.

The concept isn’t entirely new. It parallels the decades-old robots.txt standard used by web publishers to control crawler access to website content. Last summer, OpenAI adapted this idea, pioneering the use of similar permissions for AI, thus allowing publishers to set preferences for the use of their online content in AI model training.

However, many content creators do not control the websites where their content appears, and their work is often used in various forms across the internet, rendering previously proposed solutions as nonsufficient. The Media Manager tool comes as an attempt to create a more scalable and efficient solution for creators to assert control over their content’s use in AI systems. It is being developed as a comprehensive tool that will allow creators to register their content and specify inclusion or exclusion from AI research and training. OpenAI plans to enhance this tool over time with more features, supporting a broader range of creator needs. This initiative involves complex machine learning research to develop a system capable of identifying copyrighted text, images, audio, and video across diverse sources. 

OpenAI is collaborating with creators, content owners, and regulators to shape the Media Manager tool, with an expected launch by 2025. This collaborative approach aims to develop the tool in a way that meets the nuanced requirements of various stakeholders and sets a standard for the AI industry.

Why does it matter?

The significance of OpenAI’s Media Manager stems from its attempt to respond to the foundational discord of how AI interacts with human-generated content. By providing tools that respect and enforce the rights of creators, OpenAI is fostering a sustainable model where AI development is aligned with ethical and legal standards. This initiative is crucial for ensuring that AI technologies are developed in ways that do not exploit but instead respect and contribute positively to the creative economy. It sets a precedent for transparency and responsibility that could influence the entire AI industry towards more ethical practices.

Nvidia and Databricks sued for alleged copyright infringement in AI model development

Nvidia Corporation and Databricks Inc. face class-action lawsuits alleging copyright infringement in the creation of their AI models. The litigation highlights a growing concern over the use of copyrighted content without permission.

The lawsuits against Nvidia corp. and Databricks Inc., filed on March 8 by authors Abdi Nazemian, Brian Keene, and Stewart O’Nan in the U.S. District Court for the Northern District of California, argue that Nvidia’s NeMo Megatron and Databricks’ MosaicML models were trained on vast datasets containing millions of copyrighted works. Notably, the complaints suggest these models include content from well-known authors like Andre Dubus III and Susan Orlean, among others, without their consent. This has sparked a broader debate on whether such practices constitute fair use, as AI developers claim, or if they infringe upon the copyrights of individual creators.

The core of the dispute lies in how AI companies compile their training data. Reports indicate that some of the data used included copyrighted material from ‘shadow libraries’ like Bibliotik, which hosts and distributes unlicensed copies of nearly 200,000 books. The involvement of such sources in training datasets could potentially undermine the legality of the AI training process, which relies on the ingestion of large volumes of text to produce sophisticated AI outputs.

Legal experts and industry analysts are closely watching these cases, as the outcomes could set important precedents for the future of AI development. Companies like Nvidia have defended their practices, stating that their development processes comply with copyright laws and emphasizing the transformative nature of AI technology. However, the plaintiffs argue that this does not justify the unauthorized use of their work, which they claim undermines their financial and creative rights.

The lawsuits against Nvidia and Databricks are part of a larger trend of legal challenges that tech giants face regarding their development of AI technologies and using copyrighted materials to train large language models (LLMs), designed to process and generate human-like text.

OpenAI, the creator of the AI model known as ChatGPT, faced similar legal scrutiny when the New York Times filed a lawsuit against it, alleging that the company used copyrighted articles to train its language models without permission. 

These developments raise crucial questions about the balance between innovation and copyright protection in the digital context.

Snap introduces watermarks for AI-generated images

Social media company Snap announced its plans to add watermarks to AI-generated images on its platform, aiming to enhance transparency and protect user content. The watermark, featuring a small ghost with a sparkle icon, will denote images created using AI tools and will appear when the image is exported or saved to the camera roll. However, how Snap intends to detect and address watermark removal remains unclear, raising questions about enforcement methods.

This move aligns with efforts by other tech giants like Microsoft, Meta, and Google, who have implemented measures to label or identify AI-generated images. Snap currently offers AI-powered features like Lenses and a selfie-focused tool called Dreams for paid users, emphasising the importance of transparency and safety in AI-driven experiences.

Why does it matter?

In its commitment to ensuring equitable access and user expectations, Snap has partnered with HackerOne to stress-test its AI image-generation tools and established a review process to address potential biases in AI results. The company’s dedication to transparency extends to providing context cards with AI-generated images and implementing controls in the Family Center to monitor teen interactions with AI, following previous controversies surrounding inappropriate responses from the ‘My AI’ chatbot. As Snap continues to evolve its AI-powered features, its focus on transparency and safety underscores its commitment to fostering a positive and inclusive user experience on its platform.

US Congress proposes Generative AI Copyright Disclosure Act

A new bill introduced in the US Congress aims to require AI companies to disclose the copyrighted material they use to train their generative AI models. The bill, named the Generative AI Copyright Disclosure Act and introduced by California Democrat Adam Schiff, mandates that AI firms submit copyrighted works in their training datasets to the Register of Copyrights before launching new generative AI systems. Companies must file this information at least 30 days before releasing their AI tools or face financial penalties. The datasets in question can contain vast amounts of text, images, music, or video content.

Congressman Schiff emphasised the need to balance AI’s potential with ethical guidelines and protections, citing AI’s disruptive influence on various aspects of society. The bill does not prohibit AI from training on copyrighted material but requires companies to disclose the copyrighted works they use. This move responds to increasing litigation and government scrutiny around whether major AI companies have unlawfully used copyrighted content to develop tools like ChatGPT.

Entertainment industry organisations and unions, including the Recording Industry Association of America and the Directors Guild of America, have supported Schiff’s bill. They argue that protecting the intellectual property of human creative content is crucial, given that AI-generated content originates from human sources. Companies like OpenAI, currently facing lawsuits alleging copyright infringement, maintain their use of copyrighted material falls under fair use, a legal doctrine permitting certain unlicensed use of copyrighted materials.

Why does it matter?

As generative AI technology evolves, concerns about the potential impact on artists’ rights grow within the entertainment industry. Notably, over 200 musicians recently issued an open letter urging increased protections against AI and cautioning against tools that could undermine or replace musicians and songwriters. The debate highlights the intersection of AI innovation, copyright law, and the livelihoods of creative professionals, presenting complex challenges for policymakers and stakeholders alike.

OpenAI utilised one million hours of YouTube content to train GPT-4

In recent reports by The New York Times, the challenges faced by AI companies in acquiring high-quality training data have come to light. The New York Times elaborates on how companies like OpenAI and Google have navigated this issue, often treading in legally ambiguous territories related to AI copyright law.

OpenAI, for instance, resorted to developing its Whisper audio transcription model by transcribing over a million hours of YouTube videos to train GPT-4, its advanced language model. Although this approach raised legal concerns, OpenAI believed it fell within fair use. The company’s president, Greg Brockman, reportedly played a hands-on role in collecting these videos.

According to a Google spokesperson, there were unconfirmed reports of OpenAI’s activities, and both Google’s terms of service and robots.txt files prohibit unauthorised scraping or downloading of YouTube content. Google also utilised transcripts from YouTube, aligned with its agreements with content creators.

Similarly, Meta encountered challenges with data availability for training its AI models. The company’s AI team discussed using copyrighted works without permission to catch up with OpenAI. Meta explored options like paying for book licenses or acquiring a large publisher to address this issue.

Why does it matter?

AI companies, including Google and OpenAI, are grappling with the dwindling availability of quality training data to improve their models. The future of AI training may involve synthetic data or curriculum learning methods, but these approaches still need to be proven. In the meantime, companies continue to explore various avenues for data acquisition, sometimes straying into legally contentious territories as they navigate this evolving landscape.

TikTok removes Universal Music songs amidst licensing dispute

TikTok initiated removal of Universal Music Publishing Group’s (UMPG) songs due to unsuccessful license renewal negotiations. Following the expiration of their licensing agreement on 31 January, TikTok, while retaining the videos, has begun silencing videos featuring songs from artists associated with UMPG.

The new policy implies that TikTok will need to exclude any music where UMPG songwriters have contributed, irrespective of the main label. This expands the impact beyond UMG-associated artists, affecting others as well—if a UMPG-affiliated songwriter contributed to a song by another label, even minimally, TikTok will be obliged to remove it from its platform.

Despite UMPG’s claim of negligible impact on its revenue, the new changes will adversely impact artists and songwriters who will lose promotion opportunities as the platform is known for enabling music discovery. Artists also stand to potentially lose out on royalty earnings on the platform. UMG recognizes these consequences but maintains its commitment to securing a new deal that justly compensates its artists.

G7 digital and tech ministers discuss AI, data flows, digital infrastructure, standards, and more

On 29-30 April 2023, G7 digital and tech ministers met in Takasaki, Japan, to discuss a wide range of digital policy topics, from data governance and artificial intelligence (AI), to digital infrastructure and competition. The outcomes of the meeting – which was also attended by representatives of India, Indonesia, Ukraine, the Economic Research Institute for ASEAN and East Asia, the International Telecommunication Union, the Organisation for Economic Co-operation and Development, UN, and the World Bank Group – include a ministerial declaration and several action plans and commitments to be endorsed at the upcoming G7 Hiroshima Summit.

During the meeting, G7 digital and tech ministers committed to strengthening cooperation on cross-border data flows, and operationalising Data Free Flow with Trust (DFFT) through an Institutional Arrangement for Partnership (IAP). IAP, expected to be launched in the coming months, is dedicated to ‘bringing governments and stakeholders together to operationalise DFFT through principles-based, solutions-oriented, evidence-based, multistakeholder, and cross-sectoral cooperation’. According to the ministers, focus areas for IAP should include data location, regulatory cooperation, trusted government access to data, and data sharing.

The ministers further noted the importance of enhancing the security and resilience of digital infrastructures. In this regard, they have committed to strengthening cooperation – within G7 and with like-minded partners – to support and enhance network resilience through measures such as ensuring and extending secure and resilient routes of submarine cables. Moreover, the group endorsed the G7 Vision of the future network in the Beyond 5G/6G era, and is committed to enhancing cooperation on research, development, and international standards setting towards building digital infrastructure for the 2030s and beyond. These commitments are also reflected in a G7 Action Plan for building a secure and resilient digital infrastructure

In addition to expressing a commitment to promoting an open, free, global, interoperable, reliable, and secure internet, G7 ministers condemned government-imposed internet shutdowns and network restrictions. When it comes to global digital governance processes, the ministers expressed support for the UN Internet Governance Forum (IGF) as the ‘leading multistakeholder forum for Internet policy discussions’ and have proposed that the upcoming Global Digital Compact reinforce, build on, and contribute to the success of the IGF and World Summit on the Information Society (WSIS) process. Also included in the internet governance section is a commitment to protecting democratic institutions and values from foreign threats, including foreign information manipulation and interference, disinformation and other forms of foreign malign activity. These issues are further detailed in an accompanying G7 Action Plan for open, free, global, interoperable, reliable, and secure internet

On matters related to emerging and disruptive technologies, the ministers acknowledged the need for ‘agile, more distributed, and multistakeholder governance and legal frameworks, designed for operationalising the principles of the rule of law, due process, democracy, and respect for human rights, while harnessing the opportunities for innovation’. They also called for the development of sustainable supply chains and agreed to continue discussions on developing collective approaches to immersive technologies such as the metaverse

With AI high on the meeting agenda, the ministers have stressed the importance of international discussions on AI governance and interoperability between AI governance frameworks, and expressed support for the development of tools for trustworthy AI (e.g. (non)regulatory frameworks, technical standards, assurance techniques) through multistakeholder international organisations. The role of technical standards in building trustworthy AI and in fostering interoperability across AI governance frameworks was highlighted both in the ministerial declaration and in the G7 Action Plan for promoting global interoperability between tools for trustworthy AI

When it comes to AI policies and regulations, the ministers noted that these should be human-centric, based on democratic values, risk-based, and forward-looking. The opportunities and challenges of generative AI technologies were also tackled, as ministers announced plans to convene future discussions on issues such as governance, safeguarding intellectual property rights, promoting transparency, and addressing disinformation. 

On matters of digital competition, the declaration highlights the importance of both using existing competition enforcement tools and developing and implementing new or updated competition policy or regulatory frameworks ‘to address issues caused by entrenched market power, promote competition, and stimulate innovation’. A summit related to digital competition for competition authorities and policymakers is planned for the fall of 2023.

European Patent Office publishes patent insight report on quantum computing

The European Patent Office (EPO) has published a patent insight report on quantum computing. The report provides an overview of quantum computing at large, while also looking at issues such as physical realisations of quantum computing, quantum error correction and mitigation, and technologies related to quantum computing and artificial intelligence/machine learning.

One of the report’s key findings is that the number of inventions in the field of quantum computing has multiplied over the last decade. In addition, quantum computing inventions show a higher growth rate than in all fields of technology in general. The above-average share of international patent applications in quantum computing suggests high economic expectations related to the technology.