A comprehensive guide to Gemini API calls on our platform.
pip install google-genai
or pip install -U google-genai
.
1️⃣ For native integration, Gemini takes care of routing traffic between AI Studio and VertexAI automatically. Just supply your AiHubMix API key and the appropriate request URL. Remember, this URL is different from the usual base_url
—follow the example below to ensure proper setup.
v1
endpoint.
include_thoughts=True
reasoning_effort
thinking_budget
parameter for optimal control.thinking_budget
should not be explicitly set.gemini-2.5-flash-preview-04-17
to enable thinking.budget
parameter to control the depth of thinking, ranging from 0 to 16K. The default budget is 1024, and the optimal marginal effect is 16K.inline_data
.
For files exceeding 20MB, a File API will be required. This functionality is not yet available; progress tracking and upload_url retrieval are under development.
EDIARESOLUTION_MEDIUM
parameter, you can adjust the image resolution, which significantly reduces input costs and minimizes the risk of errors with large images.Supported media resolution values:Name | Description |
---|---|
MEDIA_RESOLUTION_UNSPECIFIED | Media resolution has not been set. |
MEDIA_RESOLUTION_LOW | Media resolution set to low (64 tokens). |
MEDIA_RESOLUTION_MEDIUM | Media resolution set to medium (256 tokens). |
MEDIA_RESOLUTION_HIGH | Media resolution set to high (zoomed reframing with 256 tokens). |
generate_content
request, the system automatically caches the input content. If a subsequent request uses the exact same content, model, and parameters, the system will instantly return the previous result, dramatically speeding up response time and potentially reducing input token costs.
response.usage_metadata
will include the cache_tokens_details
field and cached_content_token_count
. You can use these to determine cache usage.When a cache hit occurs,Core conclusion: Implicit caching is automatic and provides clear cache hit feedback. Developers can check usage_metadata for cache status. Cost savings are not guaranteed—actual benefit depends on request structure and cache hit rates.response.usage_metadata
will contain:
tool_choice="auto"
in the request body, otherwise it will report an error.
usage_metadata
. Here’s what each field means:
prompt_token_count
: number of input tokenscandidates_token_count
: number of output tokensthoughts_token_count
: tokens used during reasoning (also counted as output)total_token_count
: total tokens used (input + output).usage
with the following fields:
usage.completion_tokens
: number of input tokensusage.prompt_tokens
: number of output tokens (including reasoning)usage.total_tokens
: total token usage