2026.01.09 – AI Rankings in January 2026: Why ChatGPT Can Be First and Eighth at the Same Time – My Diary

Key Takeaways

One clear winner, on one clear board
In the Text Arena “Overall” leaderboard, the top-ranked text model is gemini-3-pro, with a model score of 1490 ± 5.

The “middle” and the “bottom” are real, too
With 293 total models on that same board, the middle is rank 147: nvidia-nemotron-3-nano-30b-a3b-bf16, and the bottom is rank 293: stablelm-tuned-alpha-7b.

The exact “5.2” placement
The entry labeled gpt-5.2 sits at rank 14, and its rank is unusually uncertain compared with many neighbors.

ChatGPT can look “low” in one ranking and “top” in another
By website traffic in the United States (North America), chatgpt.com is ranked number 1 in the AI Chatbots and Tools category, while quality leaderboards can rank individual model versions differently.

Story & Details

The one-winner question
A common demand sounds simple: name the best Artificial Intelligence system in the world, full stop—no “it depends,” no nuance, just first place, last place, and the one in the middle. The emotion behind that demand is easy to recognize. The surprise can sting even more when the most familiar name does not sit at the top.

A quality leaderboard that picks a single first place
In the Text Arena “Overall” ranking (last updated December 30, 2025), the number 1 slot goes to gemini-3-pro with a 1490 ± 5 score. Close behind sit gemini-3-flash and grok-4.1-thinking, followed by several Anthropic and xAI entries before OpenAI’s highest entry in that top cluster. The board also shows a “rank spread,” which is a simple warning label: a model can slide up or down because the score is an estimate, not a fixed fact.

Where “ChatGPT is eighth” can come from
One OpenAI entry, gpt-5.1-high, sits at rank 8. That can make it feel like “ChatGPT is eighth,” because everyday speech often uses “ChatGPT” to mean “whatever OpenAI model feels best right now.”
But the same board also lists a specific entry named chatgpt-4o-latest-20250326, and it sits at rank 17. On this leaderboard, “ChatGPT” is treated as one specific version label, not as a whole product family.

The exact “5.2” placement, and why it can look harsh
The entry labeled gpt-5.2 is at rank 14, with a score of 1443 ± 12. That “± 12” is a wide confidence interval for a top entry, and it comes with a big clue: the board shows far fewer votes for gpt-5.2 than for many nearby models. Fewer votes means more uncertainty, and more uncertainty means the rank can swing. In plain words, rank 14 here can be more “blurry” than it looks.

Who is ahead of rank 8 on this board
The models placed above gpt-5.1-high are, in order:
gemini-3-pro, gemini-3-flash, grok-4.1-thinking, claude-opus-4-5-20251101-thinking-32k, claude-opus-4-5-20251101, grok-4.1, and gemini-3-flash (thinking-minimal).

The middle and the bottom, without guessing
With 293 total models listed, the “middle” is not a vibe. It is arithmetic. The board’s rank 147 entry is nvidia-nemotron-3-nano-30b-a3b-bf16, scored at 1318 ± 8.
At the bottom, rank 293 is stablelm-tuned-alpha-7b, scored at 953 ± 13.

A different kind of ranking where ChatGPT is number 1
Quality is not the only thing people mean when they say “best.” Some mean “the one most people reach for.” In traffic rankings for the United States (North America), chatgpt.com sits at number 1 in the AI Chatbots and Tools category for December 2025, with gemini.google.com at number 2.
On a broader worldwide traffic list (not limited to AI sites), chatgpt.com appears at position 4 for November 2025, sitting among the largest internet destinations on Earth. In that sense—reach, habit, daily use—ChatGPT can be described as number 1 in the world inside its category, and near the very top of the wider web.

A tiny Dutch mini-lesson, built for quick reuse
Sometimes a technical topic lands better with a small human anchor. Here is a short, practical Dutch micro-pack:

Dank je wel
Dank = thanks; je = you; wel = well/indeed.
Natural use: friendly and standard “thank you.”

Alstublieft
Als = if; het = it; u = you (formal); blieft = please/like.
Natural use: polite “please” and also “here you go” in shops.

Hoe gaat het?
Hoe = how; gaat = goes; het = it.
Natural use: everyday “How are you?” with a neutral tone.

Conclusions

One board, one champion
On the Text Arena “Overall” board updated in late December 2025, gemini-3-pro is the single first-place model.

One name, many meanings
The shock about “ChatGPT being eighth” often comes from mixing a product name with a specific model label. One OpenAI entry is rank 8, another “ChatGPT” labeled entry is rank 17, and the “5.2” label sits at rank 14 with a wide uncertainty band.

A simple way to hold both truths
ChatGPT can be “number 1” in popularity rankings and still land lower than the top model in a quality leaderboard, because those rankings are measuring different things with different yardsticks.

Selected References

[1] LMArena Text Arena Leaderboard (Text). https://lmarena.ai/leaderboard/text
[2] Similarweb: Top AI Chatbots and Tools Websites Ranking in the United States (North America), December 2025. https://www.similarweb.com/top-websites/united-states/ai-chatbots-and-tools/
[3] Semrush: Most Visited Websites in the World, Updated November 2025. https://www.semrush.com/website/top/
[4] Chatbot Arena paper (method background). https://arxiv.org/abs/2306.05685
[5] Stanford CS224U lecture on evaluation metrics (YouTube). https://youtu.be/YygGzfkhtJc

Appendix

Artificial Intelligence
Computer systems that can perform tasks that usually need human thinking, such as writing, planning, and answering questions.

Confidence Interval
A range around a score that shows how uncertain the score is. A larger range means more uncertainty.

Elo Rating
A scoring method first used in games like chess, where results from many pairwise matches update a rating over time.

Leaderboard
A public ranking list that orders systems by a chosen score.

Model
A trained system that takes an input (like text) and produces an output (like an answer).

Model Score
A single number used to compare models on a specific test or voting setup.

Popularity Ranking
A ranking based on how many people visit or use something, often measured through web traffic.

Rank Spread
A shown range for where a model could land if uncertainty is taken seriously; it is a quick picture of how stable the rank is.

Text Arena
A leaderboard setting focused on text tasks, where models are compared on text-only performance.

Votes
Counted comparisons used to build the ranking; more votes usually make the score more stable.

2026.01.09 – AI Rankings in January 2026: Why ChatGPT Can Be First and Eighth at the Same Time

Key Takeaways

Story & Details

Conclusions

Selected References

Appendix

Published by Leonardo Tomás Cardillo

Leave a comment Cancel reply

Key Takeaways

Story & Details

Conclusions

Selected References

Appendix

Share this:

Related

Published by Leonardo Tomás Cardillo

Leave a comment Cancel reply