Anthropic AI training upheld as fair use; pirated book storage heads to trial

Judge rules Anthropic used books fairly for training, but not for building a central storage library.

Anthropic, generative AI, fair use, lawsuit

A US federal judge has ruled that Anthropic’s use of books to train its AI model falls under fair use, marking a pivotal decision for the generative AI industry.

The ruling, delivered by US District Judge William Alsup in San Francisco, held that while AI training using copyrighted works was lawful, storing millions of pirated books in a central library constituted copyright infringement.

The case involves authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson, who sued Anthropic last year. They claimed the Amazon- and Alphabet-backed firm had used pirated versions of their books without permission or compensation to train its Claude language model.

The proposed class action lawsuit is among several lawsuits filed by copyright holders against AI developers, including OpenAI, Microsoft, and Meta.

Judge Alsup stated that Anthropic’s training of Claude was ‘exceedingly transformative’, likening it to how a human reader learns to write by studying existing works. He concluded that the training process served a creative and educational function that US copyright law protects under the doctrine of fair use.

‘Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to replicate them but to create something different,’ the ruling said.

However, Alsup drew a clear line between fair use and infringement regarding storage practices. Anthropic’s copying and storage of over 7 million books in what the court described as a ‘central library of all the books in the world’ was not covered by fair use.

The judge ordered a trial scheduled for December to determine how much Anthropic may owe in damages. US copyright law permits statutory damages of up to $150,000 per work for wilful infringement.

Anthropic argued in court that its use of the books was consistent with copyright law’s intent to promote human creativity.

The company claimed that its system studied the writing to extract uncopyrightable insights and to generate original content. It also maintained that the source of the digital copies was irrelevant to the fair use determination.

Judge Alsup disagreed, noting that downloading content from pirate websites when lawful access was possible may not qualify as a reasonable step. He expressed scepticism that infringers could justify acquiring such copies as necessary for a later claim of fair use.

The decision is the first judicial interpretation of fair use in the context of generative AI. It will likely influence ongoing legal battles over how AI companies source and use copyrighted material for model training. Anthropic has not yet commented on the ruling.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!