Why ChatGPT 'hallucinates'? OpenAI blames testing methods

OpenAI says AI “hallucinations” happen not because chatbots lie, but due to flawed testing methods that reward guessing over honesty. The company argues future benchmarks must penalize confident mistakes and reward uncertainty to make AI more reliable.

By Storyboard18| Sep 8, 2025 12:19 PM

“Instead of rewarding only accuracy, tests should penalize confident mistakes more than honest admissions of uncertainty,” OpenAI suggested in its latest research.

When ChatGPT or similar tools make up facts with confidence, it’s not because they’re “lying” but because of how they’ve been trained and tested, OpenAI has revealed. The company says fixing artificial intelligence hallucinations may require rethinking how AI performance is measured, not just how models are built.

Hallucinations, in AI terms, occur when a chatbot generates answers that sound convincing but are factually incorrect. In one example, researchers found the system invented details about a scientist’s PhD dissertation and even gave the wrong birthday. The problem, OpenAI argues, comes less from flawed memory and more from incentives baked into evaluation.

Most current benchmarks reward a correct answer but treat an “I don’t know” response as failure. This encourages models to “guess,” much like students taking a multiple-choice test. Over time, AI learns that sounding confident—even when wrong, is better than admitting uncertainty.

“Instead of rewarding only accuracy, tests should penalize confident mistakes more than honest admissions of uncertainty,” OpenAI suggested in its latest research. In short, honesty should count more than bold but wrong answers.

The way large models are trained also plays a role. They learn by predicting the “next word” in billions of sentences, which works well for grammar and common facts but breaks down for rare or specific details, such as birthdays or niche research topics.

Interestingly, OpenAI noted that smaller models sometimes manage uncertainty better, avoiding risky guesses compared to their larger counterparts. This shows hallucinations are not an unfixable glitch but a matter of designing better guardrails.

SPOTLIGHT

Special Coverage Calling India’s Boldest Brand Makers: Entries Open for the Storyboard18 Awards for Creativity

From purpose-driven work and narrative-rich brand films to AI-enabled ideas and creator-led collaborations, the awards reflect the full spectrum of modern creativity.

“Two drunks leaning on a lamppost”: Sir Martin Sorrell on the Omnicom–IPG merger and the turbulence ahead

In a wide-ranging interview with Storyboard18, Sorrell delivers his frankest assessment yet of how the deal will redefine creativity, media, and talent across markets.

Why ChatGPT 'hallucinates'? OpenAI blames testing methods

OpenAI says AI “hallucinations” happen not because chatbots lie, but due to flawed testing methods that reward guessing over honesty. The company argues future benchmarks must penalize confident mistakes and reward uncertainty to make AI more reliable.

SPOTLIGHT

“Two drunks leaning on a lamppost”: Sir Martin Sorrell on the Omnicom–IPG merger and the turbulence ahead

POPULAR

More from Storyboard18

Digital

Amazon in talks to invest in OpenAI | ChatGPT free users shifted to GPT-5.2 Instant | Meta AI glasses updated

Special Coverage

23% Hiring surge in India | Honda recalls 70,000 vehicles | India deemed as world's largest AI market

Trending News

Sunjay Kapur estate row deepens as sister alleges Priya Sachdev draws Rs 5 crore monthly salary

Brand Makers

Ravi Makwana appointed as Chief Marketing Officer at Vadilal Industries

Digital

Google founder Sergey Brin cautions against using Gemini Live while driving

Digital

India emerges as world’s largest market for AI and LLM adoption: Bank of America

Trending News

Raghav Chadha flags Blinkit rider pay row after 15-hour shift yields Rs 763

Brand Makers

Ravi Makwana appointed as Chief Marketing Officer at Vadilal Industries