Tech experts seek toughest AI test yet
The new exam aims to challenge advanced AI.
A group of technology experts has launched a global call for ‘Humanity’s Last Exam‘ aiming to push AI systems to their limits by posing the most difficult questions possible. The Center for AI Safety (CAIS) and Scale AI are leading an initiative to establish when AI achieves expert-level capabilities. Current benchmark tests have become too easy for many AI models, so this effort aims to create a new exam that emphasises abstract reasoning, an area in which AI still faces challenges. The organisers hope this new exam will remain relevant as AI technology evolves.
The demand for more rigorous tests comes after OpenAI released its newest model, OpenAI o1, which has shown strong performance in traditional reasoning benchmarks. Dan Hendricks, executive director of CAIS, stated that AI systems like Anthropic’s Claude model had significantly improved standard tests, rendering these benchmarks less valuable. However, AI has struggled with more intricate tasks like planning and visual pattern recognition, highlighting the necessity for more advanced assessments.
The exam will include over 1,000 crowd-sourced questions that are challenging even for non-experts. Its goal is to prevent AI from simply memorising answers by keeping some questions private. Participants have until 1 November to submit questions, and there will be rewards for the best contributions. While the exam is designed to test AI thoroughly, questions about weapons will be excluded to avoid potential risks.