AiHubMix Documentation Hub

Imagen Guide

Imagen is an advanced series of image generation AI models developed by Google, capable of creating high-quality, realistic images based on text prompts. This guide will help you understand how to use the Imagen API to generate images, including parameter settings, model selection, and code examples. Available Models：

imagen-4.0-ultra-generate-001
imagen-4.0-generate-001
imagen-4.0-fast-generate-001
imagen-4.0-fast-generate-preview-06-06
imagen-3.0-generate-002

Currently, Imagen only supports English prompts. When integrating, it’s recommended to add automatic translation to allow users to use it without language barriers.
Performance is unstable when rendering large amounts of text. It’s recommended to only include key keywords.

Model Parameters

Imagen currently only supports English prompts and provides the following parameters:

numberOfImages: The number of images to generate, ranging from 1 to 4 (inclusive). The default value is 4.
imagen-4.0-ultra-generate-001 can only generate 1 image at a time.
aspectRatio: Changes the aspect ratio of the generated images. Supported values are “1:1”, “3:4”, “4:3”, “9:16”, and “16:9”. The default value is “1:1”.
personGeneration: Allows the model to generate images of people. Supports the following values:
- “DONT_ALLOW”: Prevents the generation of images containing people.
- “ALLOW_ADULT”: Generates images of adults but not children. This is the default value.

Usage Pricing

The cost of using the Imagen API to generate images:

imagen-4-ultra：$0.06/image
imagen-4：$0.04/image
imagen-4-fast：$0.02/image
imagen-3：$0.03/image

Please note that each API call can generate 1-4 images, and you will be charged based on the actual number of images generated.

API Call Example

Here’s a Python example of generating images using Imagen 3.0:

import os
import time
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

client = genai.Client(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    http_options={"base_url": "https://aihubmix.com/gemini"},
)

# Currently only supports English prompts, performance is poor with large amounts of text
response = client.models.generate_images(
    model='imagen-4.0-fast-generate-001',
    prompt='A minimalist logo for a LLM router market company on a solid white background. trident in a circle as the main symbol, with ONLY text \'InferEra\' below.',
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="1:1", # supports "1:1", "9:16", "16:9", "3:4", or "4:3".
    )
)

script_dir = os.path.dirname(os.path.abspath(__file__))
output_dir = os.path.join(script_dir, "output")

os.makedirs(output_dir, exist_ok=True)

# Generate timestamp as filename prefix to avoid filename conflicts
timestamp = int(time.time())

# Save and display the generated images
if response and hasattr(response, 'generated_images') and response.generated_images:
    for i, generated_image in enumerate(response.generated_images):
        try:
            image = Image.open(BytesIO(generated_image.image.image_bytes))
            image.show()
            
            file_name = f"imagen3_{timestamp}_{i+1}.png"
            file_path = os.path.join(output_dir, file_name)
            image.save(file_path)
            
            print(f"Image saved to: {file_path}")
        except Exception as e:
            print(f"Error processing image {i+1}: {e}")
else:
    print("Error: No valid image response received")
    print(f"Response type: {type(response)}")
    if response:
        print(f"Response attributes: {dir(response)}")
        if hasattr(response, 'generated_images'):
            print(f"generated_images value: {response.generated_images}")
    else:
        print("Response is empty, please check API key and network connection")

Prompt Tips

Creating effective prompts is crucial for obtaining desired images:

Use detailed descriptions, including subject, style, lighting, angle, etc.
Specify artistic styles (such as cinematic, photorealistic, anime style, etc.).
Include technical details (such as DSLR, high-definition, rich in detail, etc.).
Avoid negative or prohibited content.
Avoid including large amounts of text in prompts, only use key keywords for more stable results.

Gemini Image Generation

Gemini also offers image generation capabilities as an alternative. Compared to Imagen, Gemini’s image generation is better suited for scenarios that require contextual understanding and reasoning, rather than pursuing ultimate artistic expression and visual quality. Instructions:

Model id: gemini-2.5-flash-image-preview
Input/Outpt pricing: Text: $0.3→$2.5/M tokens; Image: $0.3→$30/M tokens
Need to add parameters to experience new features: "modalities":["text","image"]
Images are passed and output in Base64 encoding
Default height for output images is 1024px
Python calls require the latest OpenAI SDK, run pip install -U openai first
For more information, visit the Gemini official documentation

Input Reference Structure:

"modalities": ["text","image"]
{
    "model": "gemini-2.5-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate a landscape painting and provide a poem to describe it"
      }
    ],
    "modalities":["text","image"], //need to add image
    "temperature": 0.7
  }'

Output Reference Structure:

"choices":
    [
        {
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
                "refusal": null,
                "multi_mod_content": //📍 New addition
                [
                    {
                        "text": "",
                        "inlineData":
                        {
                          "data":"base64 str",
                          "mimeType":"png"
                        }
                    },
                    {
                        "text": "hello",
                        "inlineData":
                        {
                        }
                    }
                ],
                "annotations":
                []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],

Text-to-Image Generation

Input: text Output: text + image

IMG_PATH="/your_path/image.jpg"

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl https://aihubmix.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-***" \
  -d '{
    "model": "gemini-2.5-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type":"text",
            "text":"describe the image with a concise and engaging paragraph, then fill color as children's crayon style"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,'$IMG_BASE64'"
            }
          }
        ]
      }
    ],
    "modalities": ["text","image"],
    "temperature": 0.7
}' \
  | grep -o '"data":"[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > /your_path/imageGen.jpg

Output Example:

Edit Image

Input: text + image
Output: text + image

import os
from openai import OpenAI
from PIL import Image
from io import BytesIO
import base64

client = OpenAI(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    base_url="https://aihubmix.com/v1",
)

project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

image_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources", "filled.jpg")
if not os.path.exists(image_path):
    raise FileNotFoundError(f"image {image_path} not exists")

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="gemini-2.5-flash-image-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "describe the image with a concise and engaging paragraph, then fill color as children's crayon style",
                },
                {
                    "type": "image_url", 
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },     
            ],
        },
    ],
    modalities=["text", "image"],
    temperature=0.7,
)
try:
    # Print basic response information without base64 data
    print(f"Creation time: {response.created}")
    print(f"Token usage: {response.usage.total_tokens}")
    
    # Check if multi_mod_content field exists
    if (
        hasattr(response.choices[0].message, "multi_mod_content")
        and response.choices[0].message.multi_mod_content is not None
    ):
        print("\nResponse content:")
        for part in response.choices[0].message.multi_mod_content:
            if "text" in part and part["text"] is not None:
                print(part["text"])
            
            # Process image content
            elif "inline_data" in part and part["inline_data"] is not None:
                print("\n🖼️ [Image content received]")
                image_data = base64.b64decode(part["inline_data"]["data"])
                mime_type = part["inline_data"].get("mime_type", "image/png")
                print(f"Image type: {mime_type}")
                
                image = Image.open(BytesIO(image_data))
                image.show()
                
                # Save image
                output_dir = os.path.join(os.path.dirname(image_path), "output")
                os.makedirs(output_dir, exist_ok=True)
                output_path = os.path.join(output_dir, "edited_image.jpg")
                image.save(output_path)
                print(f"✅ Image saved to: {output_path}")
            
    else:
        print("No valid multimodal response received, check response structure")
except Exception as e:
    print(f"Error processing response: {str(e)}")

Output Example:

Choosing the Right Model

When to Choose Gemini:

When you need to leverage world knowledge and reasoning abilities to generate contextually relevant images.
When you need seamless integration of text and images.
When you want to embed accurate visual content in long text sequences.
When you want to edit images conversationally while maintaining context.

When to Choose Imagen:

When image quality, photorealism, artistic detail, or specific styles (such as impressionism, anime) are the primary considerations.
When performing professional editing tasks, such as product background updates or image enlargement.
When injecting branding, style, or generating logos and product designs.

Best Practices

Optimize prompts: Carefully crafting prompts is key to obtaining high-quality output.
Experiment with parameters: Try different aspect ratios and settings to find the configuration that best suits your needs.
Batch generation: Generate multiple images to increase the chance of getting ideal results.
Save metadata: Save prompts and timestamps along with images to track and replicate successful results.
Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of service.

Veo 3.0 Video Generation

VEO 3.0 is the latest advanced video generation model developed by Google DeepMind. With VEO 3.0, you can generate videos with the following features:

Enhanced quality from text and image prompts
Speech, such as dialogue and voiceovers
Audio, such as music and sound effects

Currently, VEO 3.0 only supports English prompts, automatic translation is recommended for integration
Videos are usually generated within a few minutes, but may take longer during peak times
Currently does not support video generation from image-based conversations

Known Limitations

Currently, VEO 3.0 parameters are fixed and cannot be changed:

Resolution: 720p (landscape)
Frame Rate: 24fps
Video Length: 8 seconds

Pricing

The cost of the VEO 3.0 API is $0.675/second (Aihubmix offers a 10% limited-time discount)

Usage Example

VEO 3.0 currently only supports curl command calls, using a two-step process: Note: sk-*** is your key generated on AiHubMix.

curl "https://aihubmix.com/gemini/v1beta/models/veo-3.0-generate-preview:predictLongRunning?key=sk-***" \
  -H "Content-Type: application/json" \
  -X "POST" \
  -d '{
    "instances":
    [
        {
            "prompt": "A cat playing with a ball"
        }
    ],
    "parameters":
    {
        "numberOfVideos": 1,
        "durationSeconds": 8,
        "aspectRatio": "16:9",
        "personGeneration": "dont_allow"
    }
}'

Response Examples

Step 1 Response:

{
  "name": "models/veo-3.0-generate-preview/operations/ff5***"
}

Step 2 Response (Generation Complete):

{
  "name": "projects/ahm-gemini-03/locations/us-central1/publishers/google/models/veo-3.0-generate-preview/operations/ff5***",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/cloud.ai.large_models.vision.GenerateVideoResponse",
    "raiMediaFilteredCount": 0,
    "videos": [
      {
        "bytesBase64Encoded": "AAA...2xl",
        "mimeType": "video/mp4"
      }
    ]
  }
}

Step 2 Response (Still Processing):

{
  "name": "projects/ahm-gemini-03/locations/us-central1/publishers/google/models/veo-3.0-generate-preview/operations/777***"
}

If you receive a processing response, please wait a few minutes and resend the Step 2 request. Video Effect:

Best Practices

Be Patient: Video generation usually takes a few minutes, longer during peak times
Check Status: If the response doesn’t contain done: true, it’s still processing
Save Operation ID: Make sure to save the operation ID returned from Step 1 for subsequent queries
Comply with Usage Policies: Ensure your usage complies with Google’s content policies and terms of use

For more information, please refer to the Vertex AI Official Documentation

Veo 3.0 Reverse API Access

AIhubmix offers a reverse access method that delivers the same output quality as the official API but at a lower rate—just $0.41 per request. However, please note that any reverse method cannot guarantee stable generation. It is recommended for use in development environments as early experimentation or for personal exploration only. Known limitations are consistent with the official API. See the “Veo 3.0 Video Generation” section above.

Response Examples

The VEO 3.0 reverse API is OpenAI-compatible. Simply specify the model ID veo-3 along with your video prompt.

from openai import OpenAI

client = OpenAI(
    api_key="sk-***", # 🔑 Replace with your API key from AiHubMix
    base_url="https://aihubmix.com/v1",
)

completion = client.chat.completions.create(
    model="veo-3",
    messages=[
        {
            "role": "user",
            "content": "a mechanical butterfly flying in the futuristic garden"
        }
    ],
    stream=False
)

print(completion.choices[0].message.content)

Example Response

The output is a video URL. Please save it locally in time.

{
  "prompt": "A sleek, metallic mechanical butterfly with intricate, glowing blue circuitry patterns on its wings flies gracefully through a futuristic garden. The garden is filled with bioluminescent plants, floating orbs of light, and holographic flowers that change colors. The butterfly's wings reflect the ambient light, creating a mesmerizing shimmer as it moves. The background features a sleek, minimalist cityscape with towering glass structures and hovering drones. The scene is bathed in a soft, ethereal glow from a setting sun, casting long shadows and enhancing the futuristic ambiance. The camera follows the butterfly in a smooth, cinematic motion, capturing the delicate movements of its wings and the vibrant, otherworldly beauty of the garden."
}

> Video generation task created
> Task ID: `8167db37-2b7c-4794-9232-891d02ca7fa3`
> To prevent task interruption, you can continuously track progress from the following links:
> [Data Preview](https://asyncdata.net/web/8167db37-2b7c-4794-9232-891d02ca7fa3) | [Source Data](https://asyncdata.net/source/8167db37-2b7c-4794-9232-891d02ca7fa3)
> Waiting for processing

> Type: Text-to-video generation
> 🎬 Starting video generation...................

> ⚠️ Retrying (0/3)

> Type: Text-to-video generation
> 🎬 Starting video generation.....................

> 🔄 Optimizing video quality.................

> 🎉 High-quality video generated

[▶️ Watch Online](https://filesystem.site/cdn/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4) | [⏬ Download Video](https://filesystem.site/cdn/download/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4)

Veo 2.0 Video Generation

VEO 2.0 is an advanced video generation AI model launched by Google, capable of creating high-quality, realistic short videos based on text prompts. This part will help you understand how to use the VEO 2.0 API to generate videos, including parameter settings, model selection, and code examples.

Currently, VEO 2.0 only supports English prompts
Video generation takes approximately 2-3 minutes, please be patient

Model Parameters

VEO 2.0 provides the following parameters:

numberOfVideos: The number of videos to generate, options are 1 or 2. Default is 2.
aspectRatio: The aspect ratio of the generated videos. Supported values are “16:9” and “9:16”.
durationSeconds: Video duration, options are 5 seconds or 8 seconds. Default is 8 seconds.
personGeneration: Controls whether to allow videos containing people. Supports the following values:
- “dont_allow”: Prevents generation of videos containing people.
- “allow_adult”: Allows generation of videos containing adults, but not children.

Pricing

The cost of the VEO 2.0 API is $0.35/s

Usage Example

Here’s a Python example of using VEO 2.0 to generate videos:

import os
import time
from google import genai
from google.genai import types

client = genai.Client(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    http_options={"base_url": "https://aihubmix.com/gemini"},
)

operation = client.models.generate_videos(
    model="veo-2.0-generate-001",
    prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
    config=types.GenerateVideosConfig(
        person_generation="dont_allow",  # "dont_allow" or "allow_adult"
        aspect_ratio="16:9",  # "16:9" or "9:16"
        number_of_videos=1, # Integer, options are 1 or 2, default is 2
        durationSeconds=5, # Integer, options are 5 or 8, default is 8
    ),
)

# Takes 2-3 minutes, video duration is 5-8s
while not operation.done:
    time.sleep(20)
    operation = client.operations.get(operation)

for n, generated_video in enumerate(operation.response.generated_videos):
    client.files.download(file=generated_video.video)
    generated_video.video.save(f"video{n}.mp4")  # Save the video

Prompt Tips

Creating effective prompts is crucial for obtaining desired videos:

Describe clear scenes, actions, and atmosphere
Specify filming styles (such as panoramic, close-up, tracking shots, etc.)
Describe lighting conditions (such as sunny, dusk, indoor lighting, etc.)
Specify the main subject and its actions (e.g., “a kitten sleeping in the sunshine”)
Avoid overly complex narratives or rapidly changing scenes
Avoid negative or prohibited content

Best Practices

Clear and concise prompts: Use clear, specific descriptions to guide video generation.
Patience is key: Video generation takes 2-3 minutes, please wait for completion.
Test different parameters: Try different aspect ratios and durations to find the settings that best suit your needs.
Save generation records: Record prompts along with generated videos to track successful results.
Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of use.

Basics

API

Platform Management

Terms and Privacy

Gemini Imagine

Imagen Guide

Model Parameters

Usage Pricing

API Call Example

Prompt Tips

Gemini Image Generation

Text-to-Image Generation

Edit Image

Choosing the Right Model

When to Choose Gemini:

When to Choose Imagen:

Best Practices

Veo 3.0 Video Generation

Known Limitations

Pricing

Usage Example

Response Examples

Best Practices

Veo 3.0 Reverse API Access

Response Examples

Example Response

Veo 2.0 Video Generation

Model Parameters

Pricing

Usage Example

Prompt Tips

Best Practices

Basics

API

Platform Management

Terms and Privacy

​Imagen Guide

​Model Parameters

​Usage Pricing

​API Call Example

​Prompt Tips

​Gemini Image Generation

​Text-to-Image Generation

​Edit Image

​Choosing the Right Model

​When to Choose Gemini:

​When to Choose Imagen:

​Best Practices

​Veo 3.0 Video Generation

​Known Limitations

​Pricing

​Usage Example

​Response Examples

​Best Practices

​Veo 3.0 Reverse API Access

​Response Examples

​Example Response

​Veo 2.0 Video Generation

​Model Parameters

​Pricing

​Usage Example

​Prompt Tips

​Best Practices

Imagen Guide

Model Parameters

Usage Pricing

API Call Example

Prompt Tips

Gemini Image Generation

Text-to-Image Generation

Edit Image

Choosing the Right Model

When to Choose Gemini:

When to Choose Imagen:

Best Practices

Veo 3.0 Video Generation

Known Limitations

Pricing

Usage Example

Response Examples

Best Practices

Veo 3.0 Reverse API Access

Response Examples

Example Response

Veo 2.0 Video Generation

Model Parameters

Pricing

Usage Example

Prompt Tips

Best Practices