GPT-4.5 outperforms humans in updated Turing Test
GPT-4.5 was mistaken for a human 73% of the time, surpassing human identification rates in testing.

Two leading AI systems, OpenAI’s GPT-4.5 and Meta’s Llama-3.1, have passed a key milestone by outperforming humans in a modern version of the Turing Test.
The experiment, conducted by researchers at the University of California San Diego, found that GPT-4.5 was mistaken for a human 73% of the time, surpassing the human identification rate. Meta’s Llama-3.1 followed closely, with a 56% success rate.
The study used a three-party test where participants held simultaneous five-minute conversations with both a human and an AI, and then tried to determine which was which.
These trials were conducted across two independent groups: university undergraduates and prolific online workers. The results provide the first substantial evidence that AI can convincingly mimic human responses in spontaneous conversations.
Earlier language models such as ELIZA and GPT-4o were correctly identified as non-human in over 75% of cases.
The success of newer models in passing this benchmark points to how rapidly conversational AI is evolving, raising fresh questions about the ethical and societal implications of indistinguishable AI interactions.
For more information on these topics, visit diplomacy.edu.