A recent experiment by researchers at Carnegie Mellon University staffed a fake software company entirely with AI Agents — an AI model designed to perform tasks on its own, basically — and the results were laughably chaotic, writes Futurism.
The simulation, dubbed TheAgentCompany, was fully stocked with artificial workers from Google, OpenAI, Anthropic and Meta. They filled roles as financial analysts, software engineers, and project managers, working alongside simulated coworkers like a faux-HR department and a chief technical officer.
As Business Insider first reported, the results were dismal. The best-performing model was Anthropic’s Claude 3.5 Sonnet, which struggled to finish just 24% of the jobs assigned to it. The study’s authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.
Speculating on the results, researchers wrote that agents are plagued with a lack of common sense, weak social skills, and a poor understanding of how to navigate the internet.
The bots also struggled with self-deception. “For example,” the Carnegie Mellon team wrote, “during the execution of one task, the agent cannot find the right person to ask questions on [company chat]. As a result, it then decides to create a shortcut solution by renaming another user to the name of the intended user.”