Why ChatGPT 'hallucinates'? OpenAI blames testing methods

OpenAI says AI “hallucinations” happen not because chatbots lie, but due to flawed testing methods that reward guessing over honesty. The company argues future benchmarks must penalize confident mistakes and reward uncertainty to make AI more reliable.

By  Storyboard18| Sep 8, 2025 12:19 PM
“Instead of rewarding only accuracy, tests should penalize confident mistakes more than honest admissions of uncertainty,” OpenAI suggested in its latest research.

When ChatGPT or similar tools make up facts with confidence, it’s not because they’re “lying” but because of how they’ve been trained and tested, OpenAI has revealed. The company says fixing artificial intelligence hallucinations may require rethinking how AI performance is measured, not just how models are built.

Hallucinations, in AI terms, occur when a chatbot generates answers that sound convincing but are factually incorrect. In one example, researchers found the system invented details about a scientist’s PhD dissertation and even gave the wrong birthday. The problem, OpenAI argues, comes less from flawed memory and more from incentives baked into evaluation.

Most current benchmarks reward a correct answer but treat an “I don’t know” response as failure. This encourages models to “guess,” much like students taking a multiple-choice test. Over time, AI learns that sounding confident—even when wrong, is better than admitting uncertainty.

“Instead of rewarding only accuracy, tests should penalize confident mistakes more than honest admissions of uncertainty,” OpenAI suggested in its latest research. In short, honesty should count more than bold but wrong answers.

The way large models are trained also plays a role. They learn by predicting the “next word” in billions of sentences, which works well for grammar and common facts but breaks down for rare or specific details, such as birthdays or niche research topics.

Interestingly, OpenAI noted that smaller models sometimes manage uncertainty better, avoiding risky guesses compared to their larger counterparts. This shows hallucinations are not an unfixable glitch but a matter of designing better guardrails.

First Published onSep 8, 2025 12:19 PM

SPOTLIGHT

Brand MarketingAI, storytelling or speed: What’s the new B2B marketing edge?

Today’s B2B marketers wear many hats: strategist, technologist, and storyteller.

Read More

Explained: What the Online Gaming Bill means for the industry, users and platforms

The Online Gaming Bill 2025 imposes severe penalties, allows warrantless search and seizure, and empowers a central authority to regulate the digital gaming ecosystem. It is expected to disrupt platforms, payment systems, and advertising in the sector. Here's all you need to know about the bill.