Gemini Imagine
Gemini image/video generation
Imagen Guide
Imagen is an advanced series of image generation AI models developed by Google, capable of creating high-quality, realistic images based on text prompts. This guide will help you understand how to use the Imagen API to generate images, including parameter settings, model selection, and code examples.
Available Models:
- imagen-4.0-generate-preview-05-20
- imagen-4.0-ultra-generate-exp-05-20
- imagen-3.0-generate-002
- Currently, Imagen only supports English prompts. When integrating, it’s recommended to add automatic translation to allow users to use it without language barriers.
- Performance is unstable when rendering large amounts of text. It’s recommended to only include key keywords.
Model Parameters
Imagen currently only supports English prompts and provides the following parameters:
- numberOfImages: The number of images to generate, ranging from 1 to 4 (inclusive). The default value is 4.
imagen-4.0-ultra-generate-exp-05-20
can only generate 1 image at a time.- aspectRatio: Changes the aspect ratio of the generated images. Supported values are “1:1”, “3:4”, “4:3”, “9:16”, and “16:9”. The default value is “1:1”.
- personGeneration: Allows the model to generate images of people. Supports the following values:
- “DONT_ALLOW”: Prevents the generation of images containing people.
- “ALLOW_ADULT”: Generates images of adults but not children. This is the default value.
Usage Pricing
The cost of using the Imagen API to generate images is $0.03 per image. Please note that each API call can generate 1-4 images, and you will be charged based on the actual number of images generated.
API Call Example
Here’s a Python example of generating images using Imagen 3.0:
Prompt Tips
Creating effective prompts is crucial for obtaining desired images:
- Use detailed descriptions, including subject, style, lighting, angle, etc.
- Specify artistic styles (such as cinematic, photorealistic, anime style, etc.).
- Include technical details (such as DSLR, high-definition, rich in detail, etc.).
- Avoid negative or prohibited content.
- Avoid including large amounts of text in prompts, only use key keywords for more stable results.
Gemini Image Generation
Gemini also offers image generation capabilities as an alternative. Compared to Imagen, Gemini’s image generation is better suited for scenarios that require contextual understanding and reasoning, rather than pursuing ultimate artistic expression and visual quality.
Instructions:
- Model id:
gemini-2.0-flash-preview-image-generation
- Input/Outpt pricing: 0.4/M tokens
- Need to add parameters to experience new features:
"modalities":["text","image"]
- Images are passed and output in Base64 encoding
- As an experimental model, it’s recommended to explicitly specify “output image”, otherwise it might only output text
- Default height for output images is 1024px
- Python calls require the latest OpenAI SDK, run
pip install -U openai
first - For more information, visit the Gemini official documentation
Input Reference Structure:
Output Reference Structure:
Text-to-Image Generation
Input: text Output: text + image
Output Example:
Edit Image
Input: text + image
Output: text + image
Output Example:
Choosing the Right Model
When to Choose Gemini:
- When you need to leverage world knowledge and reasoning abilities to generate contextually relevant images.
- When you need seamless integration of text and images.
- When you want to embed accurate visual content in long text sequences.
- When you want to edit images conversationally while maintaining context.
When to Choose Imagen:
- When image quality, photorealism, artistic detail, or specific styles (such as impressionism, anime) are the primary considerations.
- When performing professional editing tasks, such as product background updates or image enlargement.
- When injecting branding, style, or generating logos and product designs.
Best Practices
- Optimize prompts: Carefully crafting prompts is key to obtaining high-quality output.
- Experiment with parameters: Try different aspect ratios and settings to find the configuration that best suits your needs.
- Batch generation: Generate multiple images to increase the chance of getting ideal results.
- Save metadata: Save prompts and timestamps along with images to track and replicate successful results.
- Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of service.
Veo 2.0 Video Generation
VEO 2.0 is an advanced video generation AI model launched by Google, capable of creating high-quality, realistic short videos based on text prompts. This part will help you understand how to use the VEO 2.0 API to generate videos, including parameter settings, model selection, and code examples.
- Currently, VEO 2.0 only supports English prompts
- Video generation takes approximately 2-3 minutes, please be patient
Model Parameters
VEO 2.0 provides the following parameters:
- numberOfVideos: The number of videos to generate, options are 1 or 2. Default is 2.
- aspectRatio: The aspect ratio of the generated videos. Supported values are “16:9” and “9:16”.
- durationSeconds: Video duration, options are 5 seconds or 8 seconds. Default is 8 seconds.
- personGeneration: Controls whether to allow videos containing people. Supports the following values:
- “dont_allow”: Prevents generation of videos containing people.
- “allow_adult”: Allows generation of videos containing adults, but not children.
Pricing
The cost of the VEO 2.0 API is $0.35/s
Usage Example
Here’s a Python example of using VEO 2.0 to generate videos:
Prompt Tips
Creating effective prompts is crucial for obtaining desired videos:
- Describe clear scenes, actions, and atmosphere
- Specify filming styles (such as panoramic, close-up, tracking shots, etc.)
- Describe lighting conditions (such as sunny, dusk, indoor lighting, etc.)
- Specify the main subject and its actions (e.g., “a kitten sleeping in the sunshine”)
- Avoid overly complex narratives or rapidly changing scenes
- Avoid negative or prohibited content
Best Practices
- Clear and concise prompts: Use clear, specific descriptions to guide video generation.
- Patience is key: Video generation takes 2-3 minutes, please wait for completion.
- Test different parameters: Try different aspect ratios and durations to find the settings that best suit your needs.
- Save generation records: Record prompts along with generated videos to track successful results.
- Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of use.