On the last day of Openai’s “SHIPMAS”, the company announced the latest model O3 and O3-MINI. This exceeded O1 with a series of benchmarks, including mathematics and science, and further exceeded O1. At the beginning, Openai CEO’s Sam Altman said that O3 was scheduled to fall at the end of January, and today the company had done a good promise.
O3-mini
Friday, Openai release The O3-Mini model, the most cost-effective model in the Openai inference series, is open to the public. Until now, the series was composed of O1 and O1-mini. Like the company’s model, this model is particularly powerful in science, mathematics and coding.
When O3-mini is selected, a moderate inference effort to balance speed and accuracy is used. The original O1 model still has a wider range of knowledge than O3-mini, but the main advantage of the new model is higher and higher than O1-MINI.
Benchmark performance
Comparing O3-mini’s performance with O1-MINI, expert testers have discovered that O3-mini provides a more clear and more reasonable response than O1-mini. According to the post, they prefer 56 % of the O3-mini response and observed that the major errors had decreased by 39 %.
Beyond the evaluation of human preference, some STEM benchmarks, including competitive mathematics (Aime 2024), PHD-level scientific questions (GPQA diamonds), competitive code (CodeForces), O3-mini, including medium-level inference. Obtained by default -O1 -MINI outpoint.
Also noteworthy is that, as you can see on the above AIME 2024 and software engineering (SWE bench verification) benchmarks, high inference efforts on benchmarks approach the O1 performance, sometimes exceeding it. The O3-mini model with moderate inference efforts matched the O1 performance on the Codeforces benchmark.
Safety
OPENAI evaluated the safety of O3-mini through public release through the approved content evaluation permitted to jailbreak. The company discovered that the model greatly exceeded the GPT-4O for the evaluation. Openai posted the evaluation results below and released the O3-mini system card. Page 37 PDF This includes detailed results of the evaluation.
How to access
All Openai paid tier, including Chatgpt Plus, Team, and Pro, can access Openai O3-MINI from today. Plus and team users will tripled the rate restrictions, 150 messages per day from 50 messages per day using O1-mini. Chatgpt Enterprise access is within a week.
Also: Copilot’s powerful new “Think Deeper” function is free for all users -how it works
The O3-mini model replaces the model picker’s O1-mini to help the same task. At the time of writing, as a paid user, I still have not yet access to O3-mini, and the O1-mini option is still displayed instead.
If you don’t have a subscription, don’t worry. You can see if O3-mini is worth the hype from a free account. What a free Chatgpt user has to do is click the “reason” in the message text box or play the response. Openai CEO Sam Altman confirmed free access in A Post to X。 Until now, all inference models have been retained behind the paywall. Openai did not specify a new model restriction for free users.