Imagen is an advanced series of image generation AI models developed by Google, capable of creating high-quality, realistic images based on text prompts. This guide will help you understand how to use the Imagen API to generate images, including parameter settings, model selection, and code examples.
Available Models:
imagen-4.0-generate-preview-05-20
imagen-4.0-ultra-generate-exp-05-20
imagen-3.0-generate-002
Currently, Imagen only supports English prompts. When integrating, it’s recommended to add automatic translation to allow users to use it without language barriers.
Performance is unstable when rendering large amounts of text. It’s recommended to only include key keywords.
Imagen currently only supports English prompts and provides the following parameters:
numberOfImages: The number of images to generate, ranging from 1 to 4 (inclusive). The default value is 4.
imagen-4.0-ultra-generate-exp-05-20 can only generate 1 image at a time.
aspectRatio: Changes the aspect ratio of the generated images. Supported values are “1:1”, “3:4”, “4:3”, “9:16”, and “16:9”. The default value is “1:1”.
personGeneration: Allows the model to generate images of people. Supports the following values:
“DONT_ALLOW”: Prevents the generation of images containing people.
“ALLOW_ADULT”: Generates images of adults but not children. This is the default value.
The cost of using the Imagen API to generate images is $0.03 per image. Please note that each API call can generate 1-4 images, and you will be charged based on the actual number of images generated.
Here’s a Python example of generating images using Imagen 3.0:
Copy
import osimport timefrom google import genaifrom google.genai import typesfrom PIL import Imagefrom io import BytesIOclient = genai.Client( api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix http_options={"base_url": "https://aihubmix.com/gemini"},)# Currently only supports English prompts, performance is poor with large amounts of textresponse = client.models.generate_images( model='imagen-3.0-generate-002', prompt='A minimalist logo for a LLM router market company on a solid white background. trident in a circle as the main symbol, with ONLY text \'InferEra\' below.', config=types.GenerateImagesConfig( number_of_images=1, aspect_ratio="1:1", # supports "1:1", "9:16", "16:9", "3:4", or "4:3". ))script_dir = os.path.dirname(os.path.abspath(__file__))output_dir = os.path.join(script_dir, "output")os.makedirs(output_dir, exist_ok=True)# Generate timestamp as filename prefix to avoid filename conflictstimestamp = int(time.time())# Save and display the generated imagesfor i, generated_image in enumerate(response.generated_images): image = Image.open(BytesIO(generated_image.image.image_bytes)) image.show() file_name = f"imagen3_{timestamp}_{i+1}.png" file_path = os.path.join(output_dir, file_name) image.save(file_path) print(f"Image saved to: {file_path}")
Gemini also offers image generation capabilities as an alternative. Compared to Imagen, Gemini’s image generation is better suited for scenarios that require contextual understanding and reasoning, rather than pursuing ultimate artistic expression and visual quality.
Instructions:
Model id: gemini-2.0-flash-preview-image-generation
Input/Outpt pricing: 0.1→0.4/M tokens
Need to add parameters to experience new features: "modalities":["text","image"]
Images are passed and output in Base64 encoding
As an experimental model, it’s recommended to explicitly specify “output image”, otherwise it might only output text
Default height for output images is 1024px
Python calls require the latest OpenAI SDK, run pip install -U openai first
import osfrom openai import OpenAIfrom PIL import Imagefrom io import BytesIOimport base64client = OpenAI( api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix base_url="https://aihubmix.com/v1",)project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))image_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources", "filled.jpg")if not os.path.exists(image_path): raise FileNotFoundError(f"image {image_path} not exists")def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode("utf-8")base64_image = encode_image(image_path)response = client.chat.completions.create( model="gemini-2.0-flash-preview-image-generation", messages=[ { "role": "user", "content": [ { "type": "text", "text": "describe the image with a concise and engaging paragraph, then fill color as children's crayon style", }, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}, }, ], }, ], modalities=["text", "image"], temperature=0.7,)try: # Print basic response information without base64 data print(f"Creation time: {response.created}") print(f"Token usage: {response.usage.total_tokens}") # Check if multi_mod_content field exists if ( hasattr(response.choices[0].message, "multi_mod_content") and response.choices[0].message.multi_mod_content is not None ): print("\nResponse content:") for part in response.choices[0].message.multi_mod_content: if "text" in part and part["text"] is not None: print(part["text"]) # Process image content elif "inline_data" in part and part["inline_data"] is not None: print("\n🖼️ [Image content received]") image_data = base64.b64decode(part["inline_data"]["data"]) mime_type = part["inline_data"].get("mime_type", "image/png") print(f"Image type: {mime_type}") image = Image.open(BytesIO(image_data)) image.show() # Save image output_dir = os.path.join(os.path.dirname(image_path), "output") os.makedirs(output_dir, exist_ok=True) output_path = os.path.join(output_dir, "edited_image.jpg") image.save(output_path) print(f"✅ Image saved to: {output_path}") else: print("No valid multimodal response received, check response structure")except Exception as e: print(f"Error processing response: {str(e)}")
AIhubmix offers a reverse access method that delivers the same output quality as the official API but at a lower rate—just $0.41 per request. However, please note that any reverse method cannot guarantee stable generation. It is recommended for use in development environments as early experimentation or for personal exploration only.
Known limitations are consistent with the official API. See the “Veo 3.0 Video Generation” section above.
The VEO 3.0 reverse API is OpenAI-compatible. Simply specify the model ID veo-3 along with your video prompt.
Copy
from openai import OpenAIclient = OpenAI( api_key="sk-***", # 🔑 Replace with your API key from AiHubMix base_url="https://aihubmix.com/v1",)completion = client.chat.completions.create( model="veo-3", messages=[ { "role": "user", "content": "a mechanical butterfly flying in the futuristic garden" } ], stream=False)print(completion.choices[0].message.content)
The output is a video URL. Please save it locally in time.
Copy
{ "prompt": "A sleek, metallic mechanical butterfly with intricate, glowing blue circuitry patterns on its wings flies gracefully through a futuristic garden. The garden is filled with bioluminescent plants, floating orbs of light, and holographic flowers that change colors. The butterfly's wings reflect the ambient light, creating a mesmerizing shimmer as it moves. The background features a sleek, minimalist cityscape with towering glass structures and hovering drones. The scene is bathed in a soft, ethereal glow from a setting sun, casting long shadows and enhancing the futuristic ambiance. The camera follows the butterfly in a smooth, cinematic motion, capturing the delicate movements of its wings and the vibrant, otherworldly beauty of the garden."}
Copy
> Video generation task created> Task ID: `8167db37-2b7c-4794-9232-891d02ca7fa3`> To prevent task interruption, you can continuously track progress from the following links:> [Data Preview](https://asyncdata.net/web/8167db37-2b7c-4794-9232-891d02ca7fa3) | [Source Data](https://asyncdata.net/source/8167db37-2b7c-4794-9232-891d02ca7fa3)> Waiting for processing> Type: Text-to-video generation> 🎬 Starting video generation...................> ⚠️ Retrying (0/3)> Type: Text-to-video generation> 🎬 Starting video generation.....................> 🔄 Optimizing video quality.................> 🎉 High-quality video generated[▶️ Watch Online](https://filesystem.site/cdn/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4) | [⏬ Download Video](https://filesystem.site/cdn/download/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4)
VEO 2.0 is an advanced video generation AI model launched by Google, capable of creating high-quality, realistic short videos based on text prompts. This part will help you understand how to use the VEO 2.0 API to generate videos, including parameter settings, model selection, and code examples.
Currently, VEO 2.0 only supports English prompts
Video generation takes approximately 2-3 minutes, please be patient
Here’s a Python example of using VEO 2.0 to generate videos:
Copy
import osimport timefrom google import genaifrom google.genai import typesclient = genai.Client( api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix http_options={"base_url": "https://aihubmix.com/gemini"},)operation = client.models.generate_videos( model="veo-2.0-generate-001", prompt="Panning wide shot of a calico kitten sleeping in the sunshine", config=types.GenerateVideosConfig( person_generation="dont_allow", # "dont_allow" or "allow_adult" aspect_ratio="16:9", # "16:9" or "9:16" number_of_videos=1, # Integer, options are 1 or 2, default is 2 durationSeconds=5, # Integer, options are 5 or 8, default is 8 ),)# Takes 2-3 minutes, video duration is 5-8swhile not operation.done: time.sleep(20) operation = client.operations.get(operation)for n, generated_video in enumerate(operation.response.generated_videos): client.files.download(file=generated_video.video) generated_video.video.save(f"video{n}.mp4") # Save the video