OpenAI highlights ChatGPT health improvements

GPT-5.5 Instant improves ChatGPT health support for urgent-care recognition and uncertainty.

ChatGPT health improvements with GPT-5.5 Instant, physician evaluations and AI health response safety

OpenAI says GPT-5.5 Instant has improved ChatGPT health-related responses, including its ability to recognise urgent situations, explain uncertainty, and request relevant context from users.

OpenAI said more than 230 million people use ChatGPT each week for health and wellness-related questions. Common uses include making sense of health information, understanding lab results, preparing for appointments, navigating insurance, building healthier habits, and deciding what to ask next.

OpenAI said GPT-5.5 Instant represents a substantial step forward in ChatGPT health performance. The model is available to free ChatGPT users, subject to usage limits, and now performs at a level comparable to OpenAI’s frontier reasoning models on some of its most demanding health evaluations.

The company said progress reflects both model improvements and physician-led evaluation work. A global network of physicians helps define expected behaviour in real-world health scenarios by reviewing model outputs, identifying failure modes and developing evaluation criteria.

OpenAI said ChatGPT health responses should be accurate, understandable, and grounded in good judgement. The company said stronger performance includes recognising when more context is needed, explaining uncertainty without overstating confidence, and helping people understand when to seek medical care.

The company uses health-specific evaluations, including HealthBench and HealthBench Professional, to assess model responses. These evaluations use realistic health conversations and physician-developed rubrics to assess accuracy, safety, communication quality, contextual awareness, completeness and appropriate escalation to medical care.

OpenAI said GPT-5.5 Instant substantially improved from GPT-5.3 Instant on an aggregate of health evaluations, including HealthBench Professional. In a separate comparison, physicians wrote responses to representative health conversations with unlimited time and internet access, while another physician panel compared those responses with model answers across 3,500 reviews.

The company said GPT-5.5 Instant responses were rated higher than physician-written and older model responses across criteria, including accuracy, communication, completeness, instruction-following, and usefulness for health-related decisions.

OpenAI also said physicians rated GPT-5.5 Instant as having fewer failure modes than older models and physician-written responses. The company cited fewer cases of missing red flags, failing to refer users to care, not tailoring responses to the local healthcare context, or not asking for additional context when needed.

OpenAI said it also uses privacy-preserving monitors on production traffic to track possible factuality issues in ChatGPT health responses. Based on recent health-related production traffic, OpenAI said the proportion of responses containing at least one flagged factuality issue has fallen by 71% over the past two months.

The company said its health work is supported by more than 260 physicians across 60 countries, 49 languages, and 26 medical specialties. Those physicians have reviewed more than 700,000 example model responses reflecting how patients and clinicians use ChatGPT in real-world situations.

OpenAI said physician feedback informs rubrics and evaluation criteria used to assess whether responses are accurate, safe, clear, complete, appropriately cautious, and useful. The company said the work also supports broader healthcare tools, including ChatGPT for Clinicians and OpenAI for Healthcare.

Why does it matter?

Health information is one of the most sensitive and high-impact areas in which consumer AI systems are used. Improvements in how ChatGPT handles uncertainty, identifies potential medical red flags and requests additional context could influence how millions of people interpret symptoms, understand medical information and prepare for interactions with healthcare professionals.

The announcement also highlights the growing importance of domain-specific evaluation in AI development. Rather than relying solely on general-purpose benchmarks, OpenAI is using physician-led reviews, specialised testing frameworks and real-world monitoring to assess performance in healthcare settings. This approach may serve as a model for evaluating AI systems in other high-stakes sectors where accuracy, safety and human oversight are essential.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!