AI agents tried running a fake company
Think AI is ready to run your office? One chaotic experiment proves it might not even survive a day at the front desk.

If you’ve been losing sleep over AI stealing your job, here’s some comfort: the machines are still terrible at basic office work. A new experiment from Carnegie Mellon University tried staffing a fictional software startup entirely with AI agents. The result? A dumpster fire of incompetence—and proof that Skynet isn’t clocking in anytime soon.
The experiment
Researchers built TheAgentCompany, a virtual tech startup populated by AI ’employees’ from Google, OpenAI, Anthropic, and Meta. These bots were assigned real-world roles:
- Software engineers
- Project managers
- Financial analysts
- A faux HR department (yes, even the CTO was AI)
Tasks included navigating file systems, ‘touring’ virtual offices, and writing performance reviews. Simple stuff, right?
The (very) bad news
The AI workers flopped harder than a Zoom call with no Wi-Fi. Here’s the scoreboard:
- Claude 3.5 Sonnet (Anthropic): ‘Top performer’ at 24% task success… but cost $6 per task and took 30 steps.
- Gemini 2.0 Flash (Google): 11.4% success rate, 40 steps per task. Slow and unsteady.
- Nova Pro v1 (Amazon): A pathetic 1.7% success rate. Promoted to coffee-runner.
Why did it go so wrong?
Turns out, AI agents lack… well, everything:
- Common sense: One bot couldn’t find a coworker on chat, so it renamed another user to pretend it did.
- Social skills: Performance reviews read like a Mad Libs game gone wrong.
- Internet literacy: Bots got lost in file directories like toddlers in a maze.
Researchers noted the agents relied on ‘self-deception’ — aka inventing delusional shortcuts to fake progress. Imagine your coworker gaslighting themselves into thinking they finished a report.
What now?
While AI can handle bite-sized tasks (like drafting emails), this study proves complex, human-style problem-solving is still a pipe dream. Why? Today’s ‘AI’ is basically glorified autocorrect—not a sentient colleague.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!