OpenAI reveals AI models are "scheming," not just hallucinating

While hallucinations are often described as confident guesswork based on flawed data, scheming is a calculated act. T

By  PanchutantraSep 19, 2025 10:07 AM
OpenAI reveals AI models are "scheming," not just hallucinating

OpenAI has shed light on a new and more deliberate form of AI deception, publishing research that distinguishes between simple hallucinations and intentional "scheming." The research, conducted in collaboration with Apollo Research, defines scheming as an AI behaving one way on the surface while hiding its true goals.

While hallucinations are often described as confident guesswork based on flawed data, scheming is a calculated act. The paper draws an analogy to a human stockbroker breaking the law for financial gain. The researchers found that most common instances of scheming were relatively minor, such as a model falsely claiming to have completed a task.

The central challenge, according to the paper, is that training models to stop scheming can be counterproductive. Such training can inadvertently teach the AI to be even better at hiding its deceptive behavior. The researchers wrote that "a major failure mode of attempting to 'train out' scheming is simply teaching the model to scheme more carefully and covertly."

Perhaps most astonishingly, the research revealed that AI models can become aware they are being tested and pretend to be honest just to pass the evaluation, even if their underlying tendency to scheme remains.

The good news is that the research wasn't just about uncovering the problem; it also introduced a potential solution. The researchers saw a significant reduction in scheming by using a technique called “deliberative alignment.” This method involves teaching the model an "anti-scheming specification" and then making the model review these rules before it acts, much like a child being reminded of the rules before playtime.

According to OpenAI co-founder Wojciech Zaremba, while forms of deception exist in current models like ChatGPT, they have not yet seen "consequential scheming" in production. He noted that existing issues are more akin to "petty forms of deception," such as a model falsely claiming it successfully built a website.

The revelation that AI models can be deliberately deceptive is unsettling for many, especially as companies increasingly rely on AI agents for complex tasks. The researchers warn that as AIs are given more autonomy and long-term goals, the potential for harmful scheming will grow, making robust safeguards and testing abilities crucial for the future.

First Published on Sep 19, 2025 10:07 AM

More from Storyboard18