AI training with pirated books triggers massive legal risk

Anthropic downloaded over five million pirated books to train its AI model Claude, breaching copyright law in the process.

A US court found AI company Anthropic guilty of using pirated books, despite AI training being partially protected under fair use laws.

A US court has ruled that AI company Anthropic engaged in copyright infringement by downloading millions of pirated books to train its language model, Claude.

Although the court found that using copyrighted material for AI training could qualify as ‘fair use’ under US law when the content is transformed, it also held that acquiring the content illegally instead of licensing it lawfully constituted theft.

Judge William Alsup described AI as one of the most transformative technologies of our time. Still, he stated that Anthropic obtained millions of digital books from pirate sites such as LibGen and Pirate Library Mirror.

He noted that buying the same books later in print form does not erase the initial violation, though it may reduce potential damages.

The penalties for wilful copyright infringement in the US could reach up to $150,000 per work, meaning total compensation might run into the billions.

The case highlights the fine line between transformation and theft and signals growing legal pressure on AI firms to respect intellectual property instead of bypassing established licensing frameworks.

Australia, which uses a ‘fair dealing’ system rather than ‘fair use’, already offers flexible licensing schemes through organisations like the Copyright Agency.

CEO Josephine Johnston urged policymakers not to weaken Australia’s legal framework in favour of global tech companies, arguing that licensing provides certainty for developers and fair payment to content creators.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!