Data deletion hampers OpenAI lawsuit progress

Amid a high-profile copyright lawsuit, OpenAI’s accidental deletion of evidence has complicated efforts to trace how its AI systems were trained, leaving the plaintiffs to start their search anew.

Character.AI, Lawsuit, AI Chatbot

OpenAI is under scrutiny after engineers accidentally erased key evidence in an ongoing copyright lawsuit filed by The New York Times and Daily News. The publishers accuse OpenAI of using their copyrighted content to train its AI models without authorisation.

The issue arose when OpenAI provided virtual machines for the plaintiffs to search its training datasets for infringed material. On 14 November 2024, OpenAI engineers deleted the search data stored on one of these machines. While most of the data was recovered, the loss of folder structures and file names rendered the information unusable for tracing specific sources in the training process.

Plaintiffs are now forced to restart the time-intensive search, leading to concerns over OpenAI’s ability to manage its own datasets. Although the deletion is not suspected to be intentional, lawyers argue that OpenAI is best equipped to perform searches and verify its use of copyrighted material. OpenAI maintains that training AI on publicly available data falls under fair use, but it has also struck licensing deals with major publishers like the Associated Press and News Corp. The company has neither confirmed nor denied using specific copyrighted works for its AI training.