OpenAI’s new reasoning models o3 and o4-mini: Key highlights

On April 16, 2025, OpenAI unveiled its latest advanced reasoning models: o3 and o4-mini. These models integrate a wide range of tools within ChatGPT, including web browsing, Python, image analysis and generation, file interpretation, canvas, automation, and memory. These capabilities allow them to tackle complex problems in math, coding, science, and visual perception more effectively.

OpenAI introduces new reasoning models: o3 and o4-mini

OpenAI has officially launched its latest reasoning models: o3 and o4-mini. These models are designed to enhance performance across various tasks, from math and coding to visual tasks, all while maintaining efficiency and cost-effectiveness.

o3 is OpenAI’s most powerful reasoning model, setting new standards for complex tasks like coding, math, science, and visual perception. Aimed at users requiring high performance across multi-disciplinary tasks that demand deep reasoning and accuracy, o3 is the top choice for demanding applications.

On the other hand, o4-mini is a compact, speed-optimized model that still delivers impressive results, particularly in math, coding, and visual tasks. With its smaller footprint, o4-mini provides an excellent option for users who need cost-effective solutions for fast-paced tasks.

According to OpenAI, here are some additional standout features of these models:

o3 includes built-in web browsing and image processing, allowing it to solve complex, multi-step problems more autonomously across various tasks.
Both o3 and o4-mini can now integrate visual data into their reasoning, enabling them to tackle problems that require a combination of images and complex decisions
OpenAI also introduced Codex CLI, a lightweight, open-source tool that enables running coding tasks directly in your terminal using o3 and o4-mini.

These models will soon be available to ChatGPT Plus, Pro, and Team users, with the o3-pro version coming in the next few weeks.

The o3 and o4-mini benchmarks

Math and logic

Both o3 and o4-mini are highly effective in handling math and logic tasks, which require precision and problem-solving abilities. Notably, in the AIME 2024 and 2025 tests, these models demonstrated impressive results:

AIME 2024: o4-mini (no tools) scored 93.4%, outperforming o3 (91.6%) and o3-mini (87.3%).
AIME 2025: o4-mini (no tools) scored 92.7%, again beating o3 (88.9%) and o3-mini (86.5%).

However, o3 outperforms o4-mini when it comes to more advanced, multi-step logic problems or higher-level mathematical tasks. Despite being more compact, o4-mini excels in simpler tasks, offering faster and more cost-effective solutions for users with less complex requirements.

Cre: OpenAI

Multimodal capabilities

One of the standout features of both o3 and o4-mini is their multimodal capabilities – they can process both text and images. For example:

MMMU: o4-mini scores 81.6%, close to o3 (82.9%) and surpassing o1 (77.6%).
MathVista: o4-mini scores 84.3%, trailing o3 (86.8%) but far above o1 (71.8%).
CharXiv (scientific figures): o4-mini scores 72.0%, behind o3 (78.6%) but better than o1 (55.1%).

These models not only “see” images but integrate visual information into their reasoning, making them capable of solving problems that require both text and visual data. This is particularly useful for analyzing charts, diagrams, or scientific images, allowing for more accurate decisions when working with complex visual data.

Cre: OpenAI

Coding

Both o3 and o4-mini excel in coding tasks, with o3 optimized for handling more complex programming problems requiring high precision. It’s perfect for software developers or AI projects that demand sophisticated coding solutions. Meanwhile, o4-mini is designed for faster results but still handles simpler coding tasks, such as bug fixes or quick code generation, offering a more cost-effective approach for less complex projects.

The benchmarks conducted by OpenAI assess the model's ability to write, debug, and understand code, offering a comprehensive evaluation of its coding proficiency.

SWE-Bench Verified: o4-mini scores 68.1%, just behind o3 at 69.1%, and ahead of o1 (48.9%) and o3-mini (49.3%).
Aider Polyglot (code editing): o4-mini-high scores 68.9% (whole) and 58.2% (diff). In comparison, o3-high scores 81.3% (whole) and 79.6% (diff).

SWE-Lancer tests. Cre: OpenAI

Aider test. Cre: OpenAI

Instruction following and agentic tool use

Both o3 and o4-mini can effectively follow multi-step instructions and use tools independently, streamlining workflows and saving time on tasks that require multiple stages of action. This is particularly useful in automating processes and interacting with various tools, such as web browsing, Python scripting, and image analysis.

Scale MultiChallenge Multi-turn instruction following: o3 excels with a score of 56.51%, significantly outperforming both o4-mini (42.99%) and other earlier models like o1 (44.93%) and o3-mini (39.89%)
BrowseComp Agentic Browsing: o4-mini scores 28.3% using basic tools, but its performance improves to 49.7% when integrated with browsing, Python, and other tools. o3 leads with a score of 51.5%

This feature is extremely valuable for automating workflows and interacting with tools that help manage complex tasks, allowing users to delegate tasks to the models for processing and receiving results with minimal user intervention.

Instruction following and agentic tool use

Cre: OpenAI

When to use o3 and 04-mini?

Use o3 when:

You need maximum performance for complex, multi-disciplinary tasks.
The task requires deep reasoning, precision, and the integration of multiple tools.
You’re working on scientific research, advanced coding, or image analysis that demands high accuracy and detail.

Use o4-mini when:

You need a faster, more affordable option without compromising too much on performance.
Your tasks involve coding, math, or image-based work but don’t require the full power of o3.
You’re working on everyday tasks, quick solutions, or when budget or time constraints are a priority.

How to Access o3 and o4-mini?

Both o3 and o4-mini are now available on ChatGPT Plus, Pro, and Team plans. The o3-pro version is expected to be released in the coming weeks. To use them, simply select your desired model from the model selector in ChatGPT.

o3 và o4-mini are now available on ChatGPT

These two models will replace the older o3-mini and o1 options. Free users can also try o4-mini in a limited capacity by choosing "Think" mode in the composer before submitting a prompt.

Additionally, the o3 and o4-mini models are available through the OpenAI API, specifically under the Chat Completions and Responses endpoints. Both the standard and high versions of o4-mini support tool use, including browsing, Python, and image inputs, depending on your toolset configuration.

Conclusion

With the introduction of o3 and o4-mini, OpenAI continues to push the boundaries of AI capabilities. Whether you need cutting-edge performance for complex tasks or a more affordable, fast solution for simpler ones, these new models are tailored to meet a wide range of needs. Their powerful multimodal capabilities, strong coding performance, and efficiency in processing large tasks make them indispensable tools for developers, researchers, and businesses alike.