Google contractors are assessing the performance of its AI model Gemini by comparing its responses to those generated by Anthropic’s Claude, The process involves scoring both models on factors such as accuracy, verbosity, and safety, with contractors taking up to 30 minutes per evaluation. Some responses reportedly identified themselves as Claude, raising questions about Google’s reliance on a competitor’s model.
Claude, recognized for prioritizing safety, often refuses to answer unsafe prompts. In contrast, Gemini has faced scrutiny for generating responses flagged for inappropriate content. Despite Google’s investment in Anthropic, Claude’s terms of service restrict its use for training competing AI models without approval.
Google DeepMind clarified that while comparative evaluations are standard practice, Gemini is not trained using Claude’s outputs. Anthropic has declined to comment on whether Google secured permission for these comparisons. Concerns have also been raised about Gemini’s accuracy, particularly on sensitive topics like healthcare.