Google tests Gemini AI against Anthropic’s Claude
The comparison raises ethical and legal concerns over accuracy.

Google contractors improving the Gemini AI model have been tasked with comparing its responses against those of Anthropic’s Claude, according to internal documents reviewed by TechCrunch. The evaluation process involves scoring responses on criteria such as truthfulness and verbosity, with contractors given up to 30 minutes per prompt to determine which model performs better. Notably, some outputs identify themselves as Claude, sparking questions about Google’s use of its competitor’s model.
Claude’s responses, known for emphasising safety, have sometimes refused to answer prompts deemed unsafe, unlike Gemini, which has faced criticism for safety violations. One such instance involved Gemini generating responses flagged for inappropriate content. Despite Google’s significant investment in Anthropic, Claude’s terms of service prohibit its use to train or build competing AI models without prior approval.
A spokesperson for Google DeepMind stated that while the company compares model outputs for evaluation purposes, it does not train Gemini using Anthropic models. Anthropic, however, declined to comment on whether Google had obtained permission to use Claude for these tests. Recent revelations also highlight contractor concerns over Gemini producing potentially inaccurate information on sensitive topics, including healthcare.