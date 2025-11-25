OpenAI’s ChatGPT, the chatbot that ignited the global boom in generative AI when it launched in late 2022, has long dominated public awareness despite the rise of strong rivals including Google’s Gemini suite, xAI’s Grok, Anthropic’s Claude, Qwen, DeepSeek and Mistral. But a new study suggests that the landscape has shifted considerably.

A benchmarking assessment conducted by UK-based company Prolific has ranked ChatGPT-4.1 only eighth among leading AI models. The study uses a proprietary benchmark known as “Humaine”, which the company describes as a framework designed to evaluate AI systems through the lens of natural human interaction, rather than the highly technical datasets and reasoning tasks favoured by researchers.

According to Prolific, traditional evaluation methods often fail to reflect real-world user needs. The company noted in a blog post, arguing that this mismatch has created a “disconnect between what gets optimised for and what people actually value." The company also highlighted that current evaluation is heavily skewed towards metrics that are meaningful to researchers but opaque to everyday users.

As per a Mint report, the company also criticised other preference-based rankings, saying platforms that rely on open voting can be subject to sample bias and disproportionately attract tech-savvy users. To counter this, Humaine incorporates automated quality monitoring to ensure participants provide thoughtful, consistent assessments.

Top 10 AI Models According to the Humaine Benchmark

Prolific’s study, published in September, produced the following ranking:

1. Gemini 2.5 Pro (Google)

2. DeepSeek v3 (DeepSeek)

3. Magistral Medium (Mistral)

4. Grok 4 (xAI)

5. Grok 3 (xAI)

6. Gemini 2.5 Flash (Google)

7. DeepSeek R1 (DeepSeek)

8. ChatGPT-4.1 (OpenAI)

9. Gemma (Google)

10. Gemini 2.0 Flash (Google)

The timing of the study is notable: it predates the release of Google’s Gemini 3 Pro and xAI’s Grok 4.1 and Grok 4.1 Thinking, meaning the leaderboard may look different if reassessed today.

What the Results Suggest

Gemini 2.5 Pro topping the list is unsurprising, given its strong performance across multiple benchmarks since launch. However, the absence of an OpenAI model from the top five — and ChatGPT ranking below competitors such as DeepSeek, Grok and Mistral — marks a striking shift in perceived capability.

Prolific did not offer specific explanations for ChatGPT’s comparatively lower ranking. However, it emphasised that Gemini 2.5 Pro consistently emerged as the strongest system across the “Overall Winner” metric, a key indicator in its evaluation framework.

As AI competition accelerates, the Humaine rankings reflect a rapidly evolving market in which user-centric performance — rather than technical supremacy alone — may increasingly shape which models lead the field.

First Published on Nov 25, 2025 5:11 PM