GPT-5 Fumbles as Business CEO: New Benchmark Shows Humans Still Rule
Models · November 23, 2025

GPT-5 Fumbles as Business CEO: New Benchmark Shows Humans Still Rule

TL;DR

A new business simulation benchmark reveals GPT-5 falls far behind humans, underperforming by nearly 10x in running a virtual company. AGI is clearly not here yet.

GPT-5 Flops in Business CEO Simulation: Reality Check for AGI Hype

Everyone's talking about artificial general intelligence as if it's right around the corner. But a new benchmark just dropped a reality check: GPT-5, OpenAI's latest powerhouse, isn't close to running a business like a human. The results aren't subtle-humans outperformed GPT-5 by a factor of 9.8x in a RollerCoaster Tycoon-style simulation. That's a huge gap, and the details are telling.

Inside the MAPs Benchmark: Theme Parks, Not Theory

The MAPs benchmark is about more than just answering trivia or chatting. It's a real test of business acumen, challenging AI agents to operate a virtual theme park. Think maintenance schedules, inventory management, planning for slow seasons, and making sure the park doesn't go bankrupt. In other words, all the messy stuff that comes with running a business in the real world.

The results? GPT-5 failed at almost every practical skill:

  • Maintenance: Rides broke down, repairs lagged, and guest satisfaction tanked.
  • Inventory: Food stands ran out, shops overstocked, and waste piled up.
  • Planning: No coherent long-term strategy-just reactive, short-sighted moves.
  • Causal Reasoning: The model struggled to link actions and outcomes, leading to random decisions.

Human participants, by contrast, juggled these variables with ease. The average human score was nearly ten times higher than GPT-5's. That's not a rounding error-that's a wall.

What's Going Wrong for LLMs?

Large language models like GPT-5 excel at tasks with clear, structured goals: writing code, summarizing articles, or answering questions. But business simulations are messy. They demand continuous planning, adapting to uncertainty, and connecting cause and effect over time. These are the exact areas where GPT-5 broke down in the MAPs paper.

Instead of acting like a savvy CEO, GPT-5 got lost in the weeds-fixating on short-term problems, missing the big picture, and failing to keep the business afloat. It couldn't maintain an effective feedback loop or adapt strategy as conditions changed. The model's "intelligence" just isn't robust enough for the dynamic, high-stakes world of business management.

Takeaways: AGI Isn't Here (Yet)

The MAPs benchmark is a wake-up call for anyone betting on AGI in 2024. There's no magic leap happening in language models. Running real businesses - even virtual ones - is still a uniquely human skill. For AI founders, this means plenty of headroom for building products that combine human expertise with narrow AI tools. For researchers, it's a signal to double down on benchmarks that reflect real-world messiness-not just test scores.

Curious how you'd stack up? Try the simulation yourself at MAPs. For a deep dive into the research, check the project's paper.

#gpt-5 #agi-benchmarks #business-simulation #large-language-models #ai-limitations · View source

More to Explore

Models · 4 days ago

AI Models Build Monica’s Apartment from Friends Using Just a Set Photo

AI models now take TV nostalgia to the next level, generating Monica’s iconic Friends apartment layout and 3D renderings from a single set photo and a prompt.

Gemini 3 Shows It Can Run and Profit From a Business Autonomously
Models · 7 days ago

Gemini 3 Shows It Can Run and Profit From a Business Autonomously

Gemini 3, Google DeepMind's latest AI model, has demonstrated the ability to autonomously operate and profit from a real business, signaling a leap in AI capabilities that could reshape the future of work and entrepreneurship.

OpenAI Rolls Out GPT-5.1 Pro: Faster, Smarter Model Now Live for Pro Users
Models · 8 days ago

OpenAI Rolls Out GPT-5.1 Pro: Faster, Smarter Model Now Live for Pro Users

GPT-5.1 Pro is now available for all OpenAI Pro users, promising faster responses, improved reasoning, and enhanced accuracy in real-world tasks. Major upgrade for anyone building with LLMs.

Models · 10 days ago

Gemini 3 Pro Instantly Codes 3D LEGO Editor, Game Emulator, and More

Google’s Gemini 3 Pro just built a full 3D LEGO editor, a classic game remake, and a Game Boy emulator from single text prompts, setting a new bar for AI-powered app creation.

McKinsey 2025 AI Report: Adoption Booms, Impact Still Up for Grabs
Industry · 7 days ago

McKinsey 2025 AI Report: Adoption Booms, Impact Still Up for Grabs

McKinsey’s new report finds 88% of businesses are using AI, but few see big returns yet. AI agents are rising, risk management lags, and the workforce impact remains unpredictable.

Google Needs to Double AI Capacity Every 6 Months, Eyes 1000x Growth by 2029
Industry · 7 days ago

Google Needs to Double AI Capacity Every 6 Months, Eyes 1000x Growth by 2029

Google execs say the company must double its AI infrastructure every six months and grow by 1000x within five years, all while keeping costs and energy flat.