11 Jul 2023

AI tools nearing text training limit, warns UC Berkeley professor

An AI expert and professor, Stuart Russell, has warned that AI developers are facing a shortage of text to train chatbots like ChatGPT. He stated that the strategy of training large language models is ‘starting to hit a brick wall’.

AI expert Stuart Russell, a professor at the University of California, Berkeley, warns of an impending shortage of text to train AI-powered chatbots. Russell, in an interview, noted that the technology behind training AI bots is reaching a limit due to the scarcity of digital text resources. Additionally, Russell highlighted concerns surrounding the training mechanisms used by AI developers like OpenAI, as they heavily depend on extensive text data for model training. This raises questions about the data collection practices employed by these models. In the case of OpenAI, Russell suggests that they may have had to augment their publicly available language data with private archives to develop their most advanced AI model, GPT-4.

Moreover, a study conducted by Epoch, a group of AI researchers, predicts that high-quality language data will be depleted by machine learning datasets before 2026. This includes data from reliable sources like books, news articles, scientific papers, filtered web content, and Wikipedia.
OpenAI, the company behind ChatGPT, has been facing lawsuits alleging the use of personal data and copyrighted materials for training purposes. The CEO of OpenAI, Sam Altman, has expressed a desire to avoid legal issues and has stated that the company has no plans to issue an Initial Public Offering (IPO) to prevent clashes with investors.

AI tools nearing text training limit, warns UC Berkeley professor

Related topics

Related technologies

Related videos

DW shorts #29 Coinbase Chaos

Related news