By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
vantagefeed.comvantagefeed.comvantagefeed.com
Notification Show More
Font ResizerAa
  • Home
  • Politics
  • Business
  • Tech
  • Health
  • Environment
  • Culture
  • Caribbean News
  • Sports
  • Entertainment
  • Science
Reading: Did Xai lie about the benchmarks for the Grok 3?
Share
Font ResizerAa
vantagefeed.comvantagefeed.com
  • Home
  • Politics
  • Business
  • Tech
  • Health
  • Environment
  • Culture
  • Caribbean News
  • Sports
  • Entertainment
  • Science
Search
  • Home
  • Politics
  • Business
  • Tech
  • Health
  • Environment
  • Culture
  • Caribbean News
  • Sports
  • Entertainment
  • Science
Have an existing account? Sign In
Follow US
vantagefeed.com > Blog > Technology > Did Xai lie about the benchmarks for the Grok 3?
Did Xai lie about the benchmarks for the Grok 3?
Technology

Did Xai lie about the benchmarks for the Grok 3?

Vantage Feed
Last updated: February 23, 2025 3:22 am
Vantage Feed Published February 23, 2025
Share
SHARE

Discussions on AI benchmarks and how they are reported by AI Labs are publicly available.

Openai employee this week defendant Elon Musk’s AI Company, Xai, has published misleading benchmark results for its latest AI model, the Grok 3. Igor Babushkin, one of Xai’s co-founders; I insisted The company was on the right.

The truth lies somewhere in between.

in Please post on Xai’s blogthe company has published a graph showing the performance of the Grok 3 at AIME 2025, a collection of challenging mathematics questions from the recent Invitational Mathematics exam. Some experts have The effectiveness of AIIME has been questioned as an AI benchmark. Nevertheless, AIME 2025 and above versions of the test are commonly used to investigate the mathematical capabilities of models.

The Xai graph showed two variants of Grok 3, Grok 3 Reasoning Beta, and Grok 3 mini inference. Defeated Openai’s most performant available model, the O3-Mini-High, at Eime 2025. “Cons@64” did not include O3-Mini-High’s AIME 2025 score.

What is Cons @64? Well, it stands for “Consensus @64” and basically gives you a model 64 that tries to answer each question in the benchmark, and receives the answer that is generated most frequently as the final answer. As you can imagine, Cons@64 tends to significantly increase the benchmark score of a model, and if you omit it from the graph it might seem as if one model actually outweighs another.

The AIME 2025 score for “@1” for Grok 3 Reasoning Beta and Grok 3 Mini Reasoning (the first score the model won on the benchmark) is below the O3-Mini-High score. The Grok 3 Reasoning Beta tracks the aftermath to some extent towards “medium” computing on Openai’s O1 model set. But Xai is Advertising Grok 3 As the “The smartest AI in the world.”

Babshkin Discussed about x Openai has previously published similarly misleading benchmark charts, although the chart compares the performance of its own models. In the discussion, we’ve put together a more “precise” graph showing the performance of almost every model at Cons@64.

How do cheerful people see my plot as an attack on Open Alley, and others see it as an attack on Glock?
(I actually believe Grok looks good there. Openai’s TTC Chicanery O3-Mini-*High*-Pass@””” 1″” deserves more scrutiny. https://t.co/djqljpcjh8 pic.twitter.com/3wh8foufic

– teortaxes▶️ (deepseek special🐋Kiro 2023–∞) (@teortaxestex) February 20, 2025

However, as AI researcher Nathan Lambert It was pointed out in the postperhaps the most important metric remains a mystery. The calculation (and currency) cost that each model took to achieve the highest score. This simply shows how little most AI benchmarks communicate about the limitations of the model and their strengths.

You Might Also Like

Google Pixel Buds Pro 2: $40 on Amazon

This is your last chance to showcase in the TechCrunch session: AI – Don’t miss it

What is Kubernetes CRD?

Alibaba’s “Zerosearch” allows AI to learn Google itself – reduces training costs by 88%

Best AirPods2025: Tested all pairs of Apple headphones and earphones

TAGGED:benchmarksGrokliexAI
Share This Article
Facebook Twitter Email Print
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow

Weekly Newsletter

Subscribe to our newsletter to get our newest articles instantly!

Subscribe my Newsletter for new posts, tips & new Articles. Let's stay updated!

Popular News
Jaren Jackson Jr., Grizzlies compete to beat Spurs
Sports

Jaren Jackson Jr., Grizzlies compete to beat Spurs

Vantage Feed Vantage Feed February 4, 2025
Bad weather and low regions will affect India’s 2024-25 pepper production
What Guests Say: 7 Sustainable Ways to Maintain Your Gutters
Determined Justin Balthazar stepped up to converge
It enables you to launch a wellness app where music sings
- Advertisement -
Ad imageAd image
Global Coronavirus Cases

Confirmed

0

Death

0

More Information:Covid-19 Statistics

Importent Links

  • About Us
  • Privacy Policy
  • Terms of Use
  • Contact
  • Disclaimer

About US

We are a dedicated team of journalists, writers, and editors who are passionate about delivering high-quality content that informs, educates, and inspires our readers.

Quick Links

  • Home
  • My Bookmarks
  • About Us
  • Contact

Categories & Tags

  • Business
  • Science
  • Politics
  • Technology
  • Entertainment
  • Sports
  • Environment
  • Culture
  • Caribbean News
  • Health

Subscribe US

Subscribe my Newsletter for new posts, tips & new Articles. Let's stay updated!

© 2024 Vantage Feed. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?