Zuckerberg defends AI training as copyright dispute deepens

A lawsuit accuses Meta of relying on LibGen, a platform linked to widespread copyright violations, for AI model training.

Zuckerberg emphasised AI’s transformative role in Meta’s future products and services, targeting over 1 billion users.

Mark Zuckerberg has defended Meta’s use of a dataset containing copyrighted e-books to train its AI models, Llama. The statement emerged from a deposition linked to the ongoing Kadrey v. Meta Platforms lawsuit, which is one of many cases challenging the use of copyrighted content in AI training. Meta reportedly relied on the controversial dataset LibGen, despite internal concerns over potential legal risks.

LibGen, a platform known for providing unauthorised access to copyrighted works, has faced numerous lawsuits and shutdown orders. Newly unsealed court documents suggest that Zuckerberg approved using the dataset to develop Meta’s Llama models. Employees allegedly flagged the dataset as problematic, warning it might undermine the company’s standing with regulators. During questioning, Zuckerberg compared the situation to YouTube’s efforts to remove pirated content, arguing against blanket bans on datasets with copyrighted material.

Meta’s practices are under heightened scrutiny as legal battles pit AI companies against copyright holders. The deposition indicates that Meta considered balancing copyright concerns with practical AI development needs. However, the company faces mounting allegations that it disregarded ethical boundaries, sparking broader debates about fair use and intellectual property in AI training.