Join our daily and weekly newsletter for the latest updates and exclusive content on industry-leading AI coverage. learn more
Google’s latest open source AI model, Gemma 3, is not the only big news from today’s Alphabet subsidiary.
No, in fact, the spotlight may have been stolen Google’s Gemini 2.0 Flash with native image generationA new experimental model that can be used for free to Google AI Studio users and developers via Google’s Gemini API.
This is the first time that a major US high-tech company has sent multimodal image generation directly to consumers within its models. Most other AI image generation tools are diffusion models (image-specific models) connected to a large-scale language model (LLM), which require a bit of interpretation between the two models, and require deriving the image requested by the user at the text prompt. This is both the previous (and yet) Openai’s previous (and yet) previous (and yet) in Google’s previous Gemini LLM connected to the Imagen diffusion model and the current setup connecting ChatGPT and various underlying LLMs to the Dall-E 3 diffusion model.
In contrast, Gemini 2.0 Flash can natively generate images within the same model where the text prompts, allowing for higher accuracy and more functionality in theory. This is the first sign that is completely true.
Gemini 2.0 Flash was first announced in December 2024, but native image generation capabilities are not turned on for users, so it integrates multimodal input, inference, and natural language understanding to generate images along text.
A new, available experimental version of Gemini-2.0-Flash-Exp allows developers to create illustrations, refine images through conversations, and generate detailed visuals based on world knowledge.
How Gemini 2.0 Flash enhances AI-generated images
in Developer Blog Posts Google, published earlier today, highlights some important features Gemini 2.0 Flash Native Image Generation:
• Text and Image Storytelling: Developers can use Gemini 2.0 Flash to generate illustrated stories while maintaining consistency in characters and settings. The model also responds to feedback, allowing users to adjust the story and change the art style.
• Conversational image editing: AI supports it Multi-turn Editthat is, users can repeatedly refine the image by providing instructions via natural language prompts. This feature allows real-time collaboration and creative exploration.
• Global knowledge-based image generation: Unlike many other image generation models, Gemini 2.0 Flash takes advantage of a wider inference capability to create more contextually relevant images. For example, you can explain recipes with detailed visuals that follow real-world ingredients and cooking methods.
• Improved text rendering: Many AI image models struggle to accurately generate easy-to-read text within images, and often produce mistakes and distorted characters. Google reports it Gemini 2.0 Flash outperforms major competitors Text rendering is especially useful for advertising, social media posts, and invitations.
The first example shows incredible possibilities and promises
Googlers and some AI power users shared examples of new image generation and editing features offered by Gemini 2.0 Flash Experimental, which was undoubtedly impressive.
AI and high-tech educators Paul Coopert “You can basically edit images in natural language [fire emoji[. Not only the ones you generate with Gemini 2.0 Flash but also existing ones,” showing how he uploaded photos and altered them using only text prompts.
Users @apolinario and @fofr showed how you could upload a headshot and modify it into totally different takes with new props like a bowl of spaghetti, or change the direction the subject was looking in while preserving their likeness with incredible accuracy, or even zoom out and generate a full body image based on nothing other than a headshot.
Google DeepMind researcher Robert Riachi showcased how the model can generate images in a pixel-art style and then create new ones in the same style based on text prompts.


AI news account TestingCatalog News reported on the rollout of Gemini 2.0 Flash Experimental’s multimodal capabilities, noting that Google is the first major lab to deploy this feature.

User @Angaisb_ aka “Angel” showed in a compelling example how a prompt to “add chocolate drizzle” modified an existing image of croissants in seconds — revealing Gemini 2.0 Flash’s fast and accurate image editing capabilities via simply chatting back and forth with the model.

YouTuber Theoretically Media pointed out that this incremental image editing without full regeneration is something the AI industry has long anticipated, demonstrating how it was easy to ask Gemini 2.0 Flash to edit an image to raise a character’s arm while preserving the entire rest of the image.

Former Googler turned AI YouTuber Bilawal Sidhu showed how the model colorizes black-and-white images, hinting at potential historical restoration or creative enhancement applications.

These early reactions suggest that developers and AI enthusiasts see Gemini 2.0 Flash as a highly flexible tool for iterative design, creative storytelling, and AI-assisted visual editing.
The swift rollout also contrasts with OpenAI’s GPT-4o, which previewed native image generation capabilities in May 2024 — nearly a year ago — but has yet to release the feature publicly—allowing Google to seize an opportunity to lead in multimodal AI deployment.
As user @chatgpt21 aka “Chris” pointed out on X, OpenAI has in this case “los[t] “Year + Leads were based on this feature for unknown reasons.” Users invited someone from Openai to comment on why.

My own testing revealed some limitations on aspect ratio size. Despite asking me to edit it with text, it seemed to be stuck at 1:1 to me, but within seconds I was able to switch the direction of the text in the image.

While much of the early discussion of native image generation for Gemini 2.0 Flash focuses on individual users and creative applications, the impact on enterprise teams, developers and software architects is important.
AI-equipped design and marketing at a large scaleFor marketing teams and content creators, Gemini 2.0 Flash acts as a cost-effective alternative to traditional graphic design workflows, allowing you to automate the creation of branded content, advertising and social media visuals. Supports text rendering within images, streamlines ad creation, packaging design, promotional graphics, and reduces reliance on manual editing.
Improved developer tools and AI workflows: For CTOs, CIOs, and software engineers, native image generation can simplify AI integration into applications and services. By combining text and image output in a single model, Gemini 2.0 Flash allows developers to build:
- AI-powered design assistant that generates UI/UX mockups or APP assets.
- An automated documentation tool that explains concepts in real time.
- A dynamic AI-driven storytelling platform for media and education.
The model also supports image editing for conversations, allowing teams to develop AI-driven interfaces. This interface helps users refine their designs through natural interactions and lowers the barrier to entry for non-technical users.
New possibilities for AI-driven productivity softwareFor enterprise teams building productivity tools with AI, Gemini 2.0 Flash can support applications such as:
- Automated presentation generation using slides and visuals created by AI.
- Legal and business document annotations using AI-generated infographics.
- Dynamically generate product mockups based on e-commerce visualizations, descriptions.
How to deploy and experiment this feature
Developers can start testing Gemini 2.0 Flash image generation features using the Gemini API. Google provides sample API requests to show how developers generate illustration stories using text and images in a single response.
from google import genai
from google.genai import types
client = genai.Client(api_key="GEMINI_API_KEY")
response = client.models.generate_content(
model="gemini-2.0-flash-exp",
contents=(
"Generate a story about a cute baby turtle in a 3D digital art style. "
"For each scene, generate an image."
),
config=types.GenerateContentConfig(
response_modalities=["Text", "Image"]
),
)
By simplifying AI-powered image generation, Gemini 2.0 Flash offers developers new ways to create illustrated content, design AI-assisted applications, and experiment with visual storytelling.