2 Sep 2025

ChatGPT safety checks may trigger police action

Beyond threats, OpenAI is working on detecting risky behaviours in ChatGPT, such as sleep deprivation and unsafe stunts, offering guidance toward trusted contacts and therapists.

OpenAI has confirmed that ChatGPT conversations signalling a risk of serious harm to others can be reviewed by human moderators and may even reach the police.

The company explained these measures in a blog post, stressing that its system is designed to balance user privacy with public safety.

The safeguards treat self-harm differently from threats to others. When a user expresses suicidal intent, ChatGPT directs them to professional resources instead of contacting law enforcement.

By contrast, conversations showing intent to harm someone else are escalated to trained moderators, and if they identify an imminent risk, OpenAI may alert authorities and suspend accounts.

The company admitted its safety measures work better in short conversations than in lengthy or repeated ones, where safeguards can weaken.

OpenAI is working to strengthen consistency across interactions and developing parental controls, new interventions for risky behaviour, and potential connections to professional help before crises worsen.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!