Researchers at Carnegie Mellon University created a virtual company staffed solely by AI ’employees’ trained on large language models from vendors including Anthropic, OpenAI, and Google, assigning them roles such as financial analyst and software engineer.
In this simulated work environment, the AI agents struggled to complete most tasks, with even the best-performing model only completing about a quarter of its assignments.
The experiment highlighted key weaknesses in current AI systems, including difficulty interpreting nuanced instructions, managing web navigation with pop-ups, and coordinating multi-step workflows without human intervention.
These gaps suggest that human judgement, adaptability and collaboration remain essential in real workplaces for the foreseeable future.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
