Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI. learn more
In our latest effort to redefine the AI landscape, Google announced gemini 2.0 Flash Thinkinga multimodal inference model that can tackle complex problems with both speed and transparency.
in Post on social network XGoogle CEO Sundar Pichai wrote that it was “the most thought-out model ever :)”.
and, Developer documentationGoogle says, “Thinking Mode allows for more powerful inference capabilities in response than the base model of the Gemini 2.0 Flash model, previously Google’s latest and greatest, released just eight days ago.” I’m explaining.
The new model supports inputs of just 32,000 tokens (approximately 50-60 pages worth of text), each output response can generate 8,000 tokens. In Google AI Studio’s side panel, the company claims it’s perfect for “multimodal understanding, reasoning,” and “coding.”
Details of the model training process, architecture, licensing, and costs have not yet been made public. Currently, Google AI Studio shows the cost per token as zero.
More accessible and transparent reasoning
Unlike OpenAI’s competitor inference models o1 and o1 mini, Gemini 2.0 allows users to access step-by-step inference through a drop-down menu, providing clearer and more transparent insight into how the model reaches its conclusions. We provide.
Gemini 2.0 addresses long-standing concerns about AI acting as a “black box” by allowing users to see how decisions are made, and it ) on par with other open source models offered by competitors.
An early, simple test of the model showed that the model answered several questions that are notoriously difficult for other AI models, such as counting the number of R’s in the word “Strawberry”, accurately and quickly (1-3 (within seconds). (See screenshot above).
In another test, when comparing two decimal numbers (9.9 and 9.11), the model systematically broke down the problem into smaller steps, from analyzing the whole number to comparing the digits to the right of the decimal point.
These results are supported by independent third-party analysis. LM arenanamed Gemini 2.0 Flash Thinking the number one performing model across all LLM categories.
Native support for image upload and analysis
Further improving on its rival OpenAI o1 family, Gemini 2.0 Flash Thinking is designed to process images from Jump.
o1 started as a text-only model, but has since expanded to include image and file upload analysis. Currently, both models can only return text.
According to , Gemini 2.0 Flash Thinking currently does not support integration with Google Search or with other Google apps or external third-party tools. Developer documentation.
The multimodal capabilities of Gemini 2.0 Flash Thinking expand the potential use cases and allow you to tackle scenarios that combine different types of data.
For example, in one test, the model solved a puzzle that required analysis of textual and visual elements, demonstrating its versatility in cross-format integration and reasoning.
Developers can take advantage of these capabilities through Google AI Studio and Vertex AI, where their models are available for experimentation.
As the AI landscape becomes more competitive, Gemini 2.0 Flash Thinking could be the beginning of a new era in problem-solving models. Its ability to process diverse data types, provide visible inference, and execute at scale has established it as a strong competitor in the inference AI market, rivaling OpenAI’s o1 family and beyond. .