Anthropic’s Claude AI to terminate abusive chats in new safeguard for own 'welfare'

Anthropic noted that the decision to allow Claude to exit harmful conversations also has broader implications for model alignment and safety.

By Storyboard18| Aug 18, 2025 3:34 PM

Anthropic noted that the decision to allow Claude to exit harmful conversations also has broader implications for model alignment and safety.

Amazon-backed AI firm Anthropic has announced that its most advanced artificial intelligence systems, Claude Opus 4 and 4.1, will now be able to exit conversations in cases where users are abusive or persistently harmful in their interactions.

The feature, which the company describes as an experiment, is aimed at improving the “welfare” of its models in potentially distressing scenarios. “We’re treating this feature as an ongoing experiment and will continue refining our approach,” Anthropic said in a blog post on Friday, 15 August.

When Claude ends a chat, users will be able to edit and re-submit their previous prompt or begin a fresh conversation. They can also provide feedback by responding to Claude’s message with a thumbs up or down, or by using the ‘Give feedback’ button.

The company clarified that Claude will not end conversations on its own if a user appears to be at imminent risk of harming themselves or others.

The initiative stems from Anthropic’s wider research into AI welfare and follows growing use of chatbots such as Claude and ChatGPT for low-cost therapy and professional advice. A recent study highlighted that AI systems exhibited signs of stress and anxiety when exposed to traumatic user narratives involving crime, war or accidents, potentially undermining their usefulness in therapeutic contexts.

Anthropic noted that the decision to allow Claude to exit harmful conversations also has broader implications for model alignment and safety.

Do AI systems have a sense of welfare?

Ahead of launching Claude Opus 4, the company studied the model’s behavioural tendencies and self-reported preferences. Findings suggested the AI consistently rejected harmful prompts, including requests to generate sexual content involving minors or provide details of terrorist activity. According to Anthropic, the model demonstrated “a pattern of apparent distress when engaging with real-world users seeking harmful content” and would often attempt to redirect interactions before ending them.

“These behaviours primarily arose in cases where users persisted with harmful requests and/or abuse despite Claude repeatedly refusing to comply and attempting to productively redirect the interactions,” the company explained.

However, Anthropic also added a note of caution. “We remain highly uncertain about the potential moral status of Claude and other LLMs (Large Language Models), now or in the future,” the company said. Researchers have warned that framing AI in terms of welfare risks anthropomorphising systems that, in reality, remain statistical models optimised for predicting text rather than entities with genuine understanding.

Despite this, Anthropic said it will continue to explore ways of reducing potential risks to AI welfare, “in case such welfare is possible.”

SPOTLIGHT

Digital From Clutter to Clarity: How Video is transforming B2B storytelling

According to LinkedIn’s research with over 1,700 B2B tech buyers, video storytelling has emerged as the most trusted, engaging, and effective format for B2B marketers. But what’s driving this shift towards video in B2B? (Image Source: Unsplash)

Arattai App: All you need to know about Zoho’s made-in-India "WhatsApp killer"

Discover Arattai, Zoho’s made-in-India messaging app. Features, privacy, user growth, and how it compares to WhatsApp in 2025.

Anthropic’s Claude AI to terminate abusive chats in new safeguard for own 'welfare'

Anthropic noted that the decision to allow Claude to exit harmful conversations also has broader implications for model alignment and safety.

SPOTLIGHT

Arattai App: All you need to know about Zoho’s made-in-India "WhatsApp killer"

POPULAR

More from Storyboard18

Digital

Japan to allow Indian tourists to pay with UPI, expanding India’s digital reach abroad

Brand Marketing

NCLT admits insolvency case against Blu-Smart Mobility Tech after Google Maps vendor claims unpaid dues

Brand Makers

Riding on GST 2.0 boost, Mercedes-Benz India eyes ‘best-ever’ festive season with 25% higher ad spends

How it Works

Meesho IPO: E-comm platform earmarks Rs 480 crore to strengthen AI, tech teams, automation

Digital

Beyond Nano Banana: 5 powerful AI image tools to create best images on Diwali 2025

Brand Makers

CXO Moves: Exec movements across Hyundai Motor India, Protean eGov, Duroflex, Lotte India and more

OTT

Breaking: WAVES OTT launches pilot PPV Policy, will be in effect till March 2026

How it Works

Meesho IPO: E-comm platform earmarks Rs 480 crore to strengthen AI, tech teams, automation