Anthropic launches Bloom to study real-world AI behaviour

Bloom has been made publicly available on GitHub, and early users are already applying the tool to examine jailbreak vulnerabilities, evaluation awareness and other AI safety-related concerns.

By Storyboard18| Dec 23, 2025 5:24 PM

Anthropic launches Bloom to study real-world AI behaviour

Bloom has been made publicly available on GitHub, and early users are already applying the tool to examine jailbreak vulnerabilities, evaluation awareness and other AI safety-related concerns.

Anthropic has launched Bloom, a new open-source tool aimed at helping researchers better understand how advanced AI models behave in real-world situations, particularly when outcomes do not unfold as expected.

As AI systems grow more powerful and are deployed in increasingly complex environments, questions around alignment and safety have become more difficult to assess. Traditional evaluation methods often take weeks or months to develop and can quickly lose relevance, as models may learn to game the tests or advance to a point where older benchmarks no longer surface meaningful behaviour. Bloom has been introduced as an attempt to address these challenges.

Rather than relying on fixed test cases, Bloom automatically generates new evaluation scenarios tailored to a specific behaviour defined by a researcher. The tool is designed to measure how frequently a model exhibits that behaviour and how severe it is across a wide range of situations, enabling faster experimentation while capturing behaviour beyond tightly controlled demonstrations.

Bloom is focused on behaviours that are considered critical for AI safety. At launch, Anthropic shared benchmark results covering four alignment-relevant behaviours: delusional sycophancy, instructed long-horizon sabotage, self-preservation, and self-preferential bias. These evaluations were conducted across 16 frontier AI models, and Anthropic stated that the outcomes closely aligned with conclusions typically reached by human evaluators.

As reported by Moneycontrol, Bloom operates through a four-step automated process. The tool first analyses the target behaviour defined by the researcher and establishes criteria for what constitutes that behaviour. It then generates multiple scenarios intended to elicit it. These scenarios are run through simulated conversations, after which a separate judge model assesses the strength of the behaviour displayed. Bloom then produces aggregate metrics, including how often the behaviour occurred.

Each evaluation run produces fresh scenarios, reducing the likelihood that models will overfit to a known test set. At the same time, reproducibility is maintained through shared configuration files, allowing researchers to reliably compare results across studies.

Bloom has been made publicly available on GitHub, and early users are already applying the tool to examine jailbreak vulnerabilities, evaluation awareness and other AI safety-related concerns, according to reports.