AI Models Assume Humans Are Too Rational, Study Finds

AI models overestimate human rationality

New research is challenging a core assumption behind today’s AI “behavior simulators”: leading language models often expect people to act more logically than they really do, which can distort predictions in economics-style games and real-world decision tasks.​

What the studies found

A peer-reviewed study in the Journal of Economic Behavior & Organization tested how large language models perform in “Keynesian beauty contest” style strategic games and found they frequently play “too smart” because they overestimate how rational their human opponents will be.​
The researchers replicated results from classic beauty-contest experiments and reported that while models can adjust to opponents with different sophistication levels, they still misread how people actually reason in the game.​

A separate research paper led by researchers affiliated with Princeton University, Boston University, and New York University evaluated several leading models (including GPT-4o, GPT-4-Turbo, Llama 3 8B/70B, and Claude 3 Opus) against large datasets of human decisions.
Across both “forward modeling” (predicting choices) and “inverse modeling” (inferring preferences from choices), the authors found the models systematically drift toward expected value (EV) logic—closer to a textbook rational-choice rule than to how people actually decide.

Key evidence (numbers)

In the risky-choice tests, the paper reports that with chain-of-thought prompting, GPT-4o’s predictions correlate very strongly with maximizing expected value (Spearman 0.94), while human choices correlate much less with that rational benchmark (0.48).
The same paper reports that zero-shot prompts can be noisy and may even lead models to sometimes underuse probability information, while chain-of-thought pushes models toward more “rational” patterns than humans show.

Where the “too rational” bias shows up

In strategic “beauty contest” games, success requires predicting what others will choose—not what a perfectly rational agent should choose—so overestimating rationality can cause systematic misses.​
The study also links the beauty contest framework to market behavior, where participants try to anticipate other participants’ expectations, not intrinsic value alone.​

In the risky-choice study, researchers used the “choices13k” risky decision dataset and describe a subset of 9,831 non-ambiguous problems for evaluation, drawn from a larger collection of 13,006 risky choice problems.
They tested three forward-modeling tasks (predicting an individual’s choice, predicting the proportion of people choosing an option, and simulating choices), then compared model outputs to human response proportions and to rational benchmarks like expected value.

Two-study snapshot

Study / venue What was tested Models referenced Main result Why it matters
Strategic games (Journal of Economic Behavior & Organization) “Keynesian beauty contest” style strategic reasoning, including “Guess the Number” variants; models play against different opponent types. ​ AI models including ChatGPT and Claude are referenced in reporting of the findings. ​ Models tend to assume opponents behave more rationally than humans do, leading to “too smart” play and losses. ​ Predicting people in markets, negotiations, and policy settings can fail if AI assumes unrealistic rationality. ​
Decision datasets (arXiv paper, June 2024) Risky-choice decisions (forward modeling) plus preference inference (inverse modeling) using established psychology datasets. GPT-4o, GPT-4-Turbo, Llama 3 8B/70B, Claude 3 Opus. With chain-of-thought prompting, models align more with expected value theory than with human choices (e.g., GPT-4o EV correlation 0.94 vs humans 0.48). Using LLMs as “human proxies” in experiments or forecasts may produce biased conclusions.

Why this matters beyond labs

The authors of the risky-choice and inference study argue that AI systems need accurate internal models of human decision-making to communicate effectively and to support safe, helpful interactions.
They also warn that if LLMs are used to simulate people for policy design, experimentation, or decision support, an overly rational “implicit human model” can mislead downstream conclusions.

Other peer-reviewed work also suggests LLMs can show systematic decision biases that differ from humans, including stronger-than-human omission bias in moral dilemmas (a tendency to prefer inaction over action).​
That PNAS study also reports that some biases may be linked to fine-tuning for chatbot behavior, which raises questions about how alignment methods reshape decision tendencies.​

Timeline of the idea

Year Milestone What it adds
1936 Keynes introduces the “beauty contest” idea to explain markets as expectation-forecasting problems. ​ Shows why predicting others can matter more than “true” value. ​
1979 Kahneman & Tversky formalize core patterns of human deviation from strict rational choice in risky decisions (referenced in the modern study’s framing). Establishes why “perfect rationality” is an unreliable human baseline.
2024 Researchers quantify that LLMs often assume humans are more rational than they are, especially with chain-of-thought prompting. Demonstrates a measurable “rationality gap” between model predictions and human choices.
2025 Beauty contest experiments show the same pattern in strategic interaction: models play too rationally and mispredict opponents. ​ Highlights practical failure modes in strategic forecasting settings. ​

What researchers say could help

One implication from the risky-choice study is that prompting style matters: chain-of-thought can increase internal consistency and rational structure, but that may move predictions away from human behavior in domains where people use heuristics.
The same paper suggests that training data and evaluation practices may over-reward “perfectly reasoned” outputs, potentially teaching models an unrealistic picture of everyday human decision-making.

In the inverse-modeling experiments, the researchers report that model inferences about other people’s preferences can correlate strongly with how humans themselves interpret others—even if humans do not behave that rationally when choosing for themselves.
That split helps explain why LLM behavior can feel “human-like” in explanation mode while still failing at predicting real human choice frequencies.

Final thoughts

Taken together, the two lines of evidence point to a consistent risk: when AI systems are asked to predict, simulate, or strategically respond to humans, they may assume a level of logic and consistency that people often do not display.​
For publishers, businesses, and policymakers using LLMs for forecasting or experiment design, these results strengthen the case for validating outputs against real behavioral data rather than relying on plausibility or fluent reasoning alone.


Subscribe to Our Newsletter

Related Articles

Top Trending

The Shift from Co-Pilot to Autopilot The Rise of Agentic SaaS
The Shift from "Co-Pilot" to "Autopilot": The Rise of Agentic SaaS
Polylaminin Breakthrough
Polylaminin Breakthrough: Can This Brazilian Discovery Finally Reverse Spinal Cord Injury?
Windows on Arm- The 2026 Shift in Laptop Architecture
Windows on Arm: The 2026 Shift in Laptop Architecture
LG CLOiD Home Robot Price
CES 2026: LG’s “Zero-Labor” AI Agent Robot Finally Has a Price Tag
Nvidia Thor Chip vs Tesla FSD
Nvidia’s “Thor” Chip vs. Tesla FSD: Jensen Huang Calls Musk’s Tech “World-Class”

LIFESTYLE

Travel Sustainably Without Spending Extra featured image
How Can You Travel Sustainably Without Spending Extra? Save On Your Next Trip!
Benefits of Living in an Eco-Friendly Community featured image
Go Green Together: 12 Benefits of Living in an Eco-Friendly Community!
Happy new year 2026 global celebration
Happy New Year 2026: Celebrate Around the World With Global Traditions
dubai beach day itinerary
From Sunrise Yoga to Sunset Cocktails: The Perfect Beach Day Itinerary – Your Step-by-Step Guide to a Day by the Water
Ford F-150 Vs Ram 1500 Vs Chevy Silverado
The "Big 3" Battle: 10 Key Differences Between the Ford F-150, Ram 1500, and Chevy Silverado

Entertainment

Samsung’s 130-Inch Micro RGB TV The Wall Comes Home
Samsung’s 130-Inch Micro RGB TV: The "Wall" Comes Home
MrBeast Copyright Gambit
Beyond The Paywall: The MrBeast Copyright Gambit And The New Rules Of Co-Streaming Ownership
Stranger Things Finale Crashes Netflix
Stranger Things Finale Draws 137M Views, Crashes Netflix
Demon Slayer Infinity Castle Part 2 release date
Demon Slayer Infinity Castle Part 2 Release Date: Crunchyroll Denies Sequel Timing Rumors
BTS New Album 20 March 2026
BTS to Release New Album March 20, 2026

GAMING

Styx Blades of Greed
The Goblin Goes Open World: How Styx: Blades of Greed is Reinventing the AA Stealth Genre.
Resident Evil Requiem Switch 2
Resident Evil Requiem: First Look at "Open City" Gameplay on Switch 2
High-performance gaming setup with clear monitor display and low-latency peripherals. n Improve Your Gaming Performance Instantly
Improve Your Gaming Performance Instantly: 10 Fast Fixes That Actually Work
Learning Games for Toddlers
Learning Games For Toddlers: Top 10 Ad-Free Educational Games For 2026
Gamification In Education
Screen Time That Counts: Why Gamification Is the Future of Learning

BUSINESS

IMF 2026 Outlook Stable But Fragile
Global Economic Outlook: IMF Predicts 3.1% Growth but "Downside Risks" Remain
India Rice Exports
India’s Rice Dominance: How Strategic Export Shifts are Reshaping South Asian Trade in 2026
Mistakes to Avoid When Seeking Small Business Funding featured image
15 Mistakes to Avoid As New Entrepreneurs When Seeking Small Business Funding
Global stock markets break record highs featured image
Global Stock Markets Surge to Record Highs Across Continents: What’s Powering the Rally—and What Could Break It
Embodied Intelligence
Beyond Screen-Bound AI: How Embodied Intelligence is Reshaping Industrial Logistics in 2026

TECHNOLOGY

The Shift from Co-Pilot to Autopilot The Rise of Agentic SaaS
The Shift from "Co-Pilot" to "Autopilot": The Rise of Agentic SaaS
Windows on Arm- The 2026 Shift in Laptop Architecture
Windows on Arm: The 2026 Shift in Laptop Architecture
LG CLOiD Home Robot Price
CES 2026: LG’s “Zero-Labor” AI Agent Robot Finally Has a Price Tag
Nvidia Thor Chip vs Tesla FSD
Nvidia’s “Thor” Chip vs. Tesla FSD: Jensen Huang Calls Musk’s Tech “World-Class”
Meta vs. The World- The Smart Glasses War Heats Up at CES
Meta vs The World: The Smart Glasses War Heats Up at CES

HEALTH

Polylaminin Breakthrough
Polylaminin Breakthrough: Can This Brazilian Discovery Finally Reverse Spinal Cord Injury?
Bio Wearables For Stress
Post-Holiday Wellness: The Rise of "Bio-Wearables" for Stress
ChatGPT Health Medical Records
Beyond the Chatbot: Why OpenAI’s Entry into Medical Records is the Ultimate Test of Public Trust in the AI Era
A health worker registers an elderly patient using a laptop at a rural health clinic in Africa
Digital Health Sovereignty: The 2026 Push for National Digital Health Records in Rural Economies
Digital Detox for Kids
Digital Detox for Kids: Balancing Online Play With Outdoor Fun [2026 Guide]