27 Feb 2026

ChatGPT Health under fire after study finds major failures in emergency detection

Researchers found ChatGPT Health under-triaged over half of simulated medical emergencies, triggering warnings from clinicians who fear delayed treatment could cause avoidable harm or death.

A new evaluation of ChatGPT Health has raised major safety concerns after researchers found it frequently failed to recognise urgent medical emergencies.

The independent study, published in Nature Medicine, reported that the system under-triaged more than half of the clinical scenarios tested, giving advice that could have delayed life-saving treatment.

The research team, led by Ashwin Ramaswamy, created sixty patient simulations ranging from minor illnesses to life-threatening conditions.

Three doctors agreed on the appropriate urgency for each case before comparing their judgement with the model’s responses. The AI performed adequately in straightforward emergencies such as strokes, yet frequently minimised danger in more complex presentations, including severe asthma and diabetic crises.

Experts also warned that ChatGPT Health struggled to detect suicidal ideation reliably. Minor changes to scenario details, such as adding normal lab results, caused safeguards to disappear entirely.

Critics, including health-misinformation researcher Alex Ruani, described the behaviour as dangerously inconsistent and capable of creating a false sense of security.

OpenAI said the study did not reflect typical real-world use but acknowledged the need for continued research and improvement.

Policy specialists argue that the findings underline the need for clear safety standards, external audits and stronger transparency requirements for AI systems operating in sensitive medical contexts.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!