Major publishers book again Meta’s Llama over AI training
Five major publishing houses have filed a proposed class action lawsuit against Meta and its CEO, Mark Zuckerberg, in the US, alleging wilful copyright infringement to train Meta’s large language models, Llama.
Meta and Mark Zuckerberg are facing a new copyright lawsuit from five major publishers, Hachette, Macmillan, McGraw-Hill, Elsevier, and Cengage, along with author Scott Turow. The plaintiffs accuse the company of using millions of copyrighted books, journal articles, textbooks, and scholarly works to train its Llama AI models without permission. Filed in the US District Court for the Southern District of New York (Manhattan federal court), the proposed complaint seeks monetary compensation, an injunction, and the destruction of allegedly infringing copies held by Meta.
The complaint argues that Meta’s AI strategy relied on protected works from trade, education, and academic publishing, including content allegedly taken from pirate libraries such as LibGen and Anna’s Archive, as well as broad web scrapes containing subscription-only material. The publishers also claim Zuckerberg personally directed or authorised the conduct, a charge Meta is expected to contest vigorously.
At the centre of the lawsuit is a policy question now shaping AI governance worldwide: whether large-scale copying for model training can be justified as fair use or requires permission, transparency, and compensation? Meta and other AI developers argue that training enables transformative innovation, while rights holders say commercial models are being built from creative and scholarly labour without licensing. A previous Meta win in an author’s case showed that courts may accept fair-use arguments, but only where plaintiffs fail to prove clear market harm.
Either way, the publishers are trying to make that market-harm argument harder to dismiss. Their filing describes Llama as an ‘infinite substitution machine’, capable of generating long-form books, educational materials, and scholarly-style outputs that may compete with human-authored works. The case also points to the alleged erosion of licensing markets, arguing that harm occurs not only when AI outputs imitate books, but also when copyrighted works are copied into commercial training pipelines without consent.
The US Copyright Office’s 2025 report said that fair use in generative AI training requires case-by-case analysis, with market effects and the source of the training material playing central roles. In the EU, the AI Act has shifted the debate toward transparency by requiring general-purpose AI providers to publish summaries of their training data and to comply with the EU copyright rules, including rights reservations for text and data mining.
Why does it matter?
The Meta case is the manifestation of a global shift in digital governance: AI copyright disputes are no longer isolated lawsuits, but part of a broader effort to define lawful data supply chains. Anthropic’s $1.5 billion settlement over pirated books, the EU’s training-data transparency regulation, and continuing legal disputes in the US all point in the same direction: courts and regulators are asking whether AI innovation can remain competitive while respecting the rights, labour, and markets that make high-quality knowledge possible.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!
