Focus shifts to improving AI models in 2024: size, data, and applications.

Looking ahead to 2024, researchers focus on three main dimensions to improve AI models: size, data, and applications. The accepted belief in AI research has been that bigger models are better, but large language models (LLMs) defy this trend.

 Pattern, Outdoors, Accessories, Nature

Interest in artificial intelligence (AI) surged in 2023 after the launch of Open AI’s Chat GPT, the internet’s most renowned chatbot. In just six months, the popularity of the topic ‘artificial intelligence’ on Google’s search engine nearly quadrupled. By August 2023, one-third of McKinsey Global Survey respondents reported that their organisations were utilising generative AI in some capacity.

Looking ahead to 2024, researchers focus on three main dimensions to improve AI models: size, data, and applications. The accepted belief in AI research has been that bigger models are better, but large language models (LLMs) defy this trend. GPT-4, the LLM powering the deluxe version of Chat GPT, required over 16,000 specialised GPU chips and weeks of training, costing over $100 million. Furthermore, inference costs now exceed training costs when deploying LLMs at scale. Efforts are underway to make AI models smaller, faster, and more cost-effective.

Data is crucial in improving AI models, but the focus shifts from quantity to quality. Acquiring high-quality training data is becoming challenging, and using existing models’ outputs as training data may lead to less capable models. Determining the right mix of training data is a critical aspect of AI research. Moreover, AI models are increasingly being trained on combinations of data types, such as natural language, computer code, images, and videos, to enhance their capabilities.

In terms of applications, researchers are working on effectively utilising AI models. Three main approaches are prompt engineering, fine-tuning, and embedding LLMs in larger architectures. Prompt engineering involves providing specific prompts to guide the model’s outputs. Fine-tuning entails training pre-existing models using narrow datasets tailored to specific tasks. Embedding LLMs in larger architectures allows for more comprehensive applications. One example is “retrieval augmented generation,” where an LLM is combined with additional software and a knowledge database to diminish the likelihood of producing false information.

While there is a strong focus on the commercial potential of AI, the pursuit of artificial general intelligence (AGI) continues. LLMs and other generative AI forms play a role in this development, but they are not believed to be the ultimate neural architecture. Stanford University’s Chris Manning states that there is room for improvement, and the field will continue evolving.

Source: The Economist