19 Aug 2025

Anthropic introduces a safety feature allowing Claude AI to terminate harmful conversations

User safety and model well-being guided Anthropic’s decision to let Claude AI terminate harmful chats instead of continuing when users push for extreme requests.

Anthropic has announced that its Claude Opus 4 and 4.1 models can now end conversations in extreme cases of harmful or abusive user interactions.

The company said the change was introduced after the AI models showed signs of ‘apparent distress’ during pre-deployment testing when repeatedly pushed to continue rejected requests.

According to Anthropic, the feature will be used only in rare situations, such as attempts to solicit information that could enable large-scale violence or requests for sexual content involving minors.

Once activated, Claude AI will be closed, preventing the user from sending new messages in that thread, though they can still access past conversations and begin new ones.

The company emphasised that the models will not use the ability when users are at imminent risk of self-harm or harming others, ensuring support channels remain open in sensitive situations.

Anthropic added that the feature is experimental and may be adjusted based on user feedback.

The move highlights the firm’s growing focus on safeguarding both AI models and human users, balancing safety with accessibility as generative AI continues to expand.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!