Saltar para o conteúdo principal

Documentation Index

Fetch the complete documentation index at: https://docs.aihubmix.com/llms.txt

Use this file to discover all available pages before exploring further.

Imagen Guide

Imagen is an advanced series of image generation AI models developed by Google, capable of creating high-quality, realistic images based on text prompts. This guide will help you understand how to use the Imagen API to generate images, including parameter settings, model selection, and code examples. Modelos Disponíveis:
  • imagen-4.0-ultra-generate-001
  • imagen-4.0-generate-001
  • imagen-4.0-fast-generate-001
  • imagen-4.0-fast-generate-preview-06-06
  • imagen-3.0-generate-002
  1. Atualmente, Imagen only suporta English prompts. When integrating, it’s recommended to add automatic translation to allow users to use it without language barriers.
  2. Performance is unstable when rendering large amounts of text. It’s recommended to only include key keywords.

Model Parameters

Imagen currently only suporta English prompts and provides o seguinte parameters:
  • numberOfImages: The number of images to generate, ranging from 1 to 4 (inclusive). The default value is 4.
  • imagen-4.0-ultra-generate-001 can only generate 1 image at a time.
  • aspectRatio: Changes the aspect ratio of the generated images. Supported values are “1:1”, “3:4”, “4:3”, “9:16”, and “16:9”. The default value is “1:1”.
  • personGeneration: Allows the model to generate images of people. Suporta o seguinte values:
    • “DONT_ALLOW”: Prevents the generation of images containing people.
    • “ALLOW_ADULT”: Geres images of adults but not children. This is the default value.

Usage Pricing

The cost of using the Imagen API to generate images:
  • imagen-4-ultra:$0.06/image
  • imagen-4:$0.04/image
  • imagen-4-fast:$0.02/image
  • imagen-3:$0.03/image
Observe que each API call can generate 1-4 images, and você irá be charged based on the actual number of images generated.

API Call Example

Here’s a Python example of generating images using Imagen 3.0:
import os
import time
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO

client = genai.Client(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    http_options={"base_url": "https://aihubmix.com/gemini"},
)

# Currently only supports English prompts, performance is poor with large amounts of text
response = client.models.generate_images(
    model='imagen-4.0-fast-generate-001',
    prompt='A minimalist logo for a LLM router market company on a solid white background. trident in a circle as the main symbol, with ONLY text \'InferEra\' below.',
    config=types.GenerateImagesConfig(
        number_of_images=1,
        aspect_ratio="1:1", # supports "1:1", "9:16", "16:9", "3:4", or "4:3".
    )
)

script_dir = os.path.dirname(os.path.abspath(__file__))
output_dir = os.path.join(script_dir, "output")

os.makedirs(output_dir, exist_ok=True)

# Generate timestamp as filename prefix to avoid filename conflicts
timestamp = int(time.time())

# Save and display the generated images
if response and hasattr(response, 'generated_images') and response.generated_images:
    for i, generated_image in enumerate(response.generated_images):
        try:
            image = Image.open(BytesIO(generated_image.image.image_bytes))
            image.show()
            
            file_name = f"imagen3_{timestamp}_{i+1}.png"
            file_path = os.path.join(output_dir, file_name)
            image.save(file_path)
            
            print(f"Image saved to: {file_path}")
        except Exception as e:
            print(f"Error processing image {i+1}: {e}")
else:
    print("Error: No valid image response received")
    print(f"Response type: {type(response)}")
    if response:
        print(f"Response attributes: {dir(response)}")
        if hasattr(response, 'generated_images'):
            print(f"generated_images value: {response.generated_images}")
    else:
        print("Response is empty, please check API key and network connection")

Prompt Tips

Creating effective prompts is crucial for obtaining desired images:
  • Use detailed descriptions, including subject, style, lighting, angle, etc.
  • Specify artistic styles (como cinematic, photorealistic, anime style, etc.).
  • Include technical details (como DSLR, high-definition, rich in detail, etc.).
  • Avoid negative or prohibited content.
  • Avoid including large amounts of text in prompts, only use key keywords for more stable results.

Gemini Image Generation

Gemini also offers image generation capabilities as an alternative. Compared to Imagen, Gemini’s image generation is better suited for scenarios that require contextual understanding and reasoning, rather than pursuing ultimate artistic expression and visual quality. Instructions:
  • Model id: gemini-2.5-flash-image-preview
  • Input/Outpt pricing: Text: $0.3→$2.5/M tokens; Image: $0.3→$30/M tokens
  • Need to add parameters to experience new features: "modalities":["text","image"]
  • Images are passed and output in Base64 encoding
  • Default height for output images is 1024px
  • Python calls require o mais recente OpenAI SDK, run pip install -U openai first
  • Para mais informações, visit the Gemini official documentation
Input Reference Structure:
"modalities": ["text","image"]
{
    "model": "gemini-2.5-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": "Generate a landscape painting and provide a poem to describe it"
      }
    ],
    "modalities":["text","image"], //need to add image
    "temperature": 0.7
  }'
Output Reference Structure:
"choices":
    [
        {
            "index": 0,
            "message":
            {
                "role": "assistant",
                "content": "Hello! How can I assist you today?",
                "refusal": null,
                "multi_mod_content": //📍 New addition
                [
                    {
                        "text": "",
                        "inlineData":
                        {
                          "data":"base64 str",
                          "mimeType":"png"
                        }
                    },
                    {
                        "text": "hello",
                        "inlineData":
                        {
                        }
                    }
                ],
                "annotations":
                []
            },
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],

Text-to-Image Generation

Input: text Output: text + image
IMG_PATH="/your_path/image.jpg"

if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
  B64FLAGS="--input"
else
  B64FLAGS="-w0"
fi

IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)

curl https://aihubmix.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-***" \
  -d '{
    "model": "gemini-2.5-flash-image-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type":"text",
            "text":"describe the image with a concise and engaging paragraph, then fill color as children's crayon style"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "data:image/jpeg;base64,'$IMG_BASE64'"
            }
          }
        ]
      }
    ],
    "modalities": ["text","image"],
    "temperature": 0.7
}' \
  | grep -o '"data":"[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > /your_path/imageGen.jpg
Exemplo de Saída: Image

Edit Image

Input: text + image
Output: text + image
import os
from openai import OpenAI
from PIL import Image
from io import BytesIO
import base64

client = OpenAI(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    base_url="https://aihubmix.com/v1",
)

project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

image_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "resources", "filled.jpg")
if not os.path.exists(image_path):
    raise FileNotFoundError(f"image {image_path} not exists")

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="gemini-2.5-flash-image-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "describe the image with a concise and engaging paragraph, then fill color as children's crayon style",
                },
                {
                    "type": "image_url", 
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },     
            ],
        },
    ],
    modalities=["text", "image"],
    temperature=0.7,
)
try:
    # Print basic response information without base64 data
    print(f"Creation time: {response.created}")
    print(f"Token usage: {response.usage.total_tokens}")
    
    # Check if multi_mod_content field exists
    if (
        hasattr(response.choices[0].message, "multi_mod_content")
        and response.choices[0].message.multi_mod_content is not None
    ):
        print("\nResponse content:")
        for part in response.choices[0].message.multi_mod_content:
            if "text" in part and part["text"] is not None:
                print(part["text"])
            
            # Process image content
            elif "inline_data" in part and part["inline_data"] is not None:
                print("\n🖼️ [Image content received]")
                image_data = base64.b64decode(part["inline_data"]["data"])
                mime_type = part["inline_data"].get("mime_type", "image/png")
                print(f"Image type: {mime_type}")
                
                image = Image.open(BytesIO(image_data))
                image.show()
                
                # Save image
                output_dir = os.path.join(os.path.dirname(image_path), "output")
                os.makedirs(output_dir, exist_ok=True)
                output_path = os.path.join(output_dir, "edited_image.jpg")
                image.save(output_path)
                print(f"✅ Image saved to: {output_path}")
            
    else:
        print("No valid multimodal response received, check response structure")
except Exception as e:
    print(f"Error processing response: {str(e)}")
Exemplo de Saída: Image

Choosing the Right Model

When to Escolha Gemini:

  • When você precisa leverage world knowledge and reasoning abilities to generate contextually relevant images.
  • When você precisa seamless integration of text and images.
  • When you want to embed accurate visual content in long text sequences.
  • When you want to edit images conversationally while maintaining context.

When to Escolha Imagen:

  • When image quality, photorealism, artistic detail, or specific styles (como impressionism, anime) are the primary considerations.
  • When performing professional editing tasks, como product background updates or image enlargement.
  • When injecting branding, style, or generating logos and product designs.

Melhores Práticas

  1. Optimize prompts: Carefully crafting prompts is key to obtaining high-quality output.
  2. Experiment with parameters: Try different aspect ratios and settings to find the configuration that best suits your needs.
  3. Batch generation: Gere multiple images to increase the chance of getting ideal results.
  4. Save metadata: Save prompts and timestamps along with images to track and replicate successful results.
  5. Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of service.

Veo 3.0 Video Generation

VEO 3.0 is o mais recente advanced video generation model developed by Google DeepMind. With VEO 3.0, você pode generate videos with o seguinte features:
  • Enhanced quality from text and image prompts
  • Speech, como dialogue and voiceovers
  • Audio, como music and sound effects
  1. Atualmente, VEO 3.0 only suporta English prompts, automatic translation é recomendado for integration
  2. Videos are usually generated within a few minutes, but may take longer during peak times
  3. Currently não suporta video generation from image-based conversations

Known Limitations

Atualmente, VEO 3.0 parameters are fixed and cannot be changed:
  • Resolution: 720p (landscape)
  • Frame Rate: 24fps
  • Video Length: 8 seconds

Preços

The cost of the VEO 3.0 API is $0.675/second (Aihubmix offers a 10% limited-time discount)

Usage Example

VEO 3.0 currently only suporta curl command calls, using a two-step process: Note: sk-*** is your key generated on AiHubMix.
curl "https://aihubmix.com/gemini/v1beta/models/veo-3.0-generate-preview:predictLongRunning?key=sk-***" \
  -H "Content-Type: application/json" \
  -X "POST" \
  -d '{
    "instances":
    [
        {
            "prompt": "A cat playing with a ball"
        }
    ],
    "parameters":
    {
        "numberOfVideos": 1,
        "durationSeconds": 8,
        "aspectRatio": "16:9",
        "personGeneration": "dont_allow"
    }
}'

Response Examples

Passo 1 Response:
{
  "name": "models/veo-3.0-generate-preview/operations/ff5***"
}
Passo 2 Response (Generation Complete):
{
  "name": "projects/ahm-gemini-03/locations/us-central1/publishers/google/models/veo-3.0-generate-preview/operations/ff5***",
  "done": true,
  "response": {
    "@type": "type.googleapis.com/cloud.ai.large_models.vision.GenerateVideoResponse",
    "raiMediaFilteredCount": 0,
    "videos": [
      {
        "bytesBase64Encoded": "AAA...2xl",
        "mimeType": "video/mp4"
      }
    ]
  }
}
Passo 2 Response (Still Processing):
{
  "name": "projects/ahm-gemini-03/locations/us-central1/publishers/google/models/veo-3.0-generate-preview/operations/777***"
}
If you receive a processing response, please wait a few minutes and resend the Passo 2 request. Video Effect:

Melhores Práticas

  1. Be Patient: Video generation usually takes a few minutes, longer during peak times
  2. Check Status: If the response doesn’t contain done: true, it’s still processing
  3. Save Operation ID: Certifique-se de salve o operation ID returned from Passo 1 for subsequent queries
  4. Comply with Usage Policies: Ensure your usage complies with Google’s content policies and terms of use
Para mais informações, consulte the Vertex AI Official Documentation

Veo 3.0 Reverse API Access

AIhubmix offers a reverse access method that delivers the same output quality as the official API but at a lower rate—just $0.41 per request. However, observe que any reverse method cannot guarantee stable generation. Recomenda-se for use in development environments as early experimentation or for personal exploration only. Known limitations are consistent com o official API. See the “Veo 3.0 Video Generation” section above.

Response Examples

The VEO 3.0 reverse API is OpenAI-compatible. Simply specify the model ID veo-3 along with your video prompt.
from openai import OpenAI

client = OpenAI(
    api_key="sk-***", # 🔑 Replace with your API key from AiHubMix
    base_url="https://aihubmix.com/v1",
)

completion = client.chat.completions.create(
    model="veo-3",
    messages=[
        {
            "role": "user",
            "content": "a mechanical butterfly flying in the futuristic garden"
        }
    ],
    stream=False
)

print(completion.choices[0].message.content)

Example Response

The output is a video URL. Please save it locally in time.
{
  "prompt": "A sleek, metallic mechanical butterfly with intricate, glowing blue circuitry patterns on its wings flies gracefully through a futuristic garden. The garden is filled with bioluminescent plants, floating orbs of light, and holographic flowers that change colors. The butterfly's wings reflect the ambient light, creating a mesmerizing shimmer as it moves. The background features a sleek, minimalist cityscape with towering glass structures and hovering drones. The scene is bathed in a soft, ethereal glow from a setting sun, casting long shadows and enhancing the futuristic ambiance. The camera follows the butterfly in a smooth, cinematic motion, capturing the delicate movements of its wings and the vibrant, otherworldly beauty of the garden."
}
> Video generation task created
> Task ID: `8167db37-2b7c-4794-9232-891d02ca7fa3`
> To prevent task interruption, you can continuously track progress from the following links:
> [Data Preview](https://asyncdata.net/web/8167db37-2b7c-4794-9232-891d02ca7fa3) | [Source Data](https://asyncdata.net/source/8167db37-2b7c-4794-9232-891d02ca7fa3)
> Waiting for processing

> Type: Text-to-video generation
> 🎬 Starting video generation...................

> ⚠️ Retrying (0/3)

> Type: Text-to-video generation
> 🎬 Starting video generation.....................

> 🔄 Optimizing video quality.................

> 🎉 High-quality video generated

[▶️ Watch Online](https://filesystem.site/cdn/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4) | [⏬ Download Video](https://filesystem.site/cdn/download/20250615/T7yfqW229fox4gJA1ys0eMAGLkcSfd.mp4)

Veo 2.0 Video Generation

VEO 2.0 is an advanced video generation AI model launched by Google, capable of creating high-quality, realistic short videos based on text prompts. This part will help you understand how to use the VEO 2.0 API to generate videos, including parameter settings, model selection, and code examples.
  1. Atualmente, VEO 2.0 only suporta English prompts
  2. Video generation takes approximately 2-3 minutes, please be patient

Model Parameters

VEO 2.0 provides o seguinte parameters:
  • numberOfVideos: The number of videos to generate, options are 1 or 2. Default is 2.
  • aspectRatio: The aspect ratio of the generated videos. Supported values are “16:9” and “9:16”.
  • durationSeconds: Video duration, options are 5 seconds or 8 seconds. Default is 8 seconds.
  • personGeneration: Controls whether to allow videos containing people. Suporta o seguinte values:
    • “dont_allow”: Prevents generation of videos containing people.
    • “allow_adult”: Allows generation of videos containing adults, but not children.

Preços

The cost of the VEO 2.0 API is $0.35/s

Usage Example

Here’s a Python example of using VEO 2.0 to generate videos:
import os
import time
from google import genai
from google.genai import types

client = genai.Client(
    api_key="sk-***", # 🔑 Replace with your key generated on AiHubMix
    http_options={"base_url": "https://aihubmix.com/gemini"},
)

operation = client.models.generate_videos(
    model="veo-2.0-generate-001",
    prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
    config=types.GenerateVideosConfig(
        person_generation="dont_allow",  # "dont_allow" or "allow_adult"
        aspect_ratio="16:9",  # "16:9" or "9:16"
        number_of_videos=1, # Integer, options are 1 or 2, default is 2
        durationSeconds=5, # Integer, options are 5 or 8, default is 8
    ),
)

# Takes 2-3 minutes, video duration is 5-8s
while not operation.done:
    time.sleep(20)
    operation = client.operations.get(operation)

for n, generated_video in enumerate(operation.response.generated_videos):
    client.files.download(file=generated_video.video)
    generated_video.video.save(f"video{n}.mp4")  # Save the video

Prompt Tips

Creating effective prompts is crucial for obtaining desired videos:
  • Describe clear scenes, actions, and atmosphere
  • Specify filming styles (como panoramic, close-up, tracking shots, etc.)
  • Describe lighting conditions (como sunny, dusk, indoor lighting, etc.)
  • Specify the main subject and its actions (p. ex., “a kitten sleeping no sunshine”)
  • Avoid overly complex narratives or rapidly changing scenes
  • Avoid negative or prohibited content

Melhores Práticas

  1. Clear and concise prompts: Use clear, specific descriptions to guide video generation.
  2. Patience is key: Video generation takes 2-3 minutes, please wait for completion.
  3. Test different parameters: Try different aspect ratios and durations to find the settings that best suit your needs.
  4. Save generation records: Record prompts along with generated videos to track successful results.
  5. Comply with usage policies: Ensure your usage complies with Google’s content policies and terms of use.

Última atualização: 2026-06-01