2025
Sep 22
- Added support for Qwen series, Doubao Seedream 4, and Baidu image generation models
Aug 10
- Released the Aihubmix Image Generation MCP, making it easier for developers to integrate image generation services
Aug 1
- Use any large language model on the AiHubMix platform directly in Claude Code
Jul 26
- Added support for the Flux image generation API, enabling high-quality images in seconds
Jul 23
- Added support for Qwen Code, leveraging all large language models available on the Aihubmix platform
Jul 4
- Added support for llms.txt: get standardized model navigation with one click so your LLM assistant can quickly understand the entire model ecosystem
Jun 29
- Added forwarding support for Gemini CLI, with multiple flexible usage modes
- Added code interpreter and Remote MCP invocation to the OpenAI Responses API
Jun 26
- Added a Unified Image Generation API, supporting major models including OpenAI, Ideogram, Stability, and Google Imagen
Jun 18
- Added HTTP Status Code documentation to help users better understand error messages
Jun 15
- Added reverse-engineered Veo 3.0 access, with a total cost of only $0.41 per video generation
Jun 13
- Added support for Veo 3.0 video generation to expand creative formats
Jun 12
- Integrated Claude Code for stable usage within mainland China
Jun 9
- Added support for OpenAI Reasoning Summaries in the Responses API
Jun 5
- Added implicit caching for Gemini, with automatic cache hits and hit feedback
Developers can useusage_metadatato determine cache hits
Cost savings are not guaranteed and depend on request structure and usage scenarios
May 31
Full Support for New Claude 4 Features
- ⏳ New cache TTL: 1-hour cache support Beta
- 🎉 New text editing tools: Claude 4 now supports
text_editor_20250429andstr_replace_based_edit_tool - 🚫 New refusal stop reason for safety-based rejections
- 🧠 Extended Thinking: Claude 4 now returns full summaries of its reasoning process
- 🔄 Interleaved Thinking: Tool usage can now interleave with extended thinking for more natural conversations (Beta)
- ⚠️ Deprecated Features:
undo_editis no longer supportedtoken-efficient-tools-2025-02-19removed (Claude 3.7 only)output-128k-2025-02-19removed (Claude 3.7 only)
- 📚 Full migration guides and code examples have been updated to help users smoothly transition from Claude 3.7 to Claude 4
May 22
- Added support for the Dify plugin, enabling seamless integration of Aihubmix models into Dify
Extend and manage over 200 models with a single API key
May 17
- Added support for
codex-mini-latest, optimized for programming tasks, accessible via the Responses API or Codex CLI - Added support for Google Imagen 3.0 image generation and Veo 2.0 video generation
gemini-2.0-flash-expupgraded to the official preview versiongemini-2.0-flash-preview-image-generation
May 9
- Added the Ideogram AI V3 API — Ideogram’s most advanced image generation model
May 6
- Added Utility Management Scripts for managing API keys, viewing accounts, and listing available models via CLI
Apr 26
- The highly anticipated OpenAI image generation API
gpt-image-1is now live, supporting both text-to-image and image-to-image - Added native Gemini API support with precise reasoning budget control for Flash 2.5
Apr 24
- Integrated three core Jina AI APIs to help build powerful agents: Embeddings, Rerank, and DeepSearch
Apr 22
- Early access (reverse-engineered) to the GPT-4o image generation API
Apr 20
- Added support for the OpenAI Responses API endpoint with expanded tool capabilities
Apr 17
- Added OpenAI CodeX CLI support: program with natural language directly in the terminal
Apr 12
- By appending
:surfingto a model ID, any model can gain search capabilities (Beta)
Apr 9
- Added Claude prompt caching, saving up to 76% in cost for repeated high-frequency prompts
Apr 7
- Added Ideogram AI image generation support with strong text rendering, hybrid generation, local edits, and upscaling
Apr 5
- Brand-new documentation experience launched
Mar 30
- Added support for the Claude Text Edit Tool
Mar 24
- Launched the brand-new Trident logo
Mar 16
- Added native search support for OpenAI and Google Gemini models
- Third-party search integration will be added in future updates
Mar 15
- Added models:
gpt-4o-mini-search-previewandgpt-4o-search-preview
Mar 7
- Prices for o1 and o3-mini reduced by 10%, in line with official pricing
Mar 6
- Due to a 7× upstream price increase from Microsoft, the price of
aihubmix-DeepSeek-R1also increased 7×
Recommended alternative: Volcano Engine’s DeepSeek-R1 (more stable and cost-effective)
Added models:qwen-qwq-32bandqwen2.5-vl-72b-instruct
Feb 28
- All Claude models received a 15% price reduction
- Added model
gpt-4.5-preview(extremely expensive — use with caution)
Feb 26
- Improved DeepSeek stability
- ByteDance versions of DeepSeek are currently the most stable
Recommended models:DeepSeek-R1andDeepSeek-V3
Feb 25
- Added model
claude-3-7-sonnet-20250219
Feb 24
- The gpt-4o model may occasionally respond very slowly due to upstream provider issues
It is recommended to temporarily switch togpt-4o-2024-11-20 - The Perplexity API is temporarily offline
Due to Perplexity’s complex billing model and higher costs than this platform’s pricing structure, the service will be relaunched after pricing adjustments - The temporary ByteDance official discount has ended and prices have returned to normal
The price ofDeepSeek-R1has been increased accordingly - Added a new model details page with full parameter information
Feb 23
- The temporary ByteDance official discount has ended and prices have returned to normal
The price ofDeepSeek-V3has been increased
ByteDance’s R1 model is also expected to return to normal pricing soon, and this platform will同步 adjust prices accordingly
Feb 18
- Added model:
kimi-latest
(Official billing is tiered by input length at 8k, 32k, and 128k.
This platform does not support tiered pricing and uses the mid-tier 32k as the pricing standard.
If you are price-sensitive, use with caution.) - Optimized overall website layout
- Merged the Changelog page into the Usage Statistics page
- Moved Announcements to the Model Marketplace page
- Moved Settings under the user avatar menu
- Reduced the price of
aihubmix-DeepSeek-R1by 50% - Added models:
gemini-2.0-pro-exp-02-05-search,gemini-2.0-flash-exp-search
(Integrated with Google’s official online search) - Added models:
gemini-2.0-flash,gemini-2.0-pro-exp-02-05,gemini-2.0-flash-lite-preview-02-05 - Added models:
o3-mini,o1
(These two models are billed about 10% higher than official pricing due to limited account resources)
Feb 4
- The
o1model does not support thestreamparameter in the official OpenAI API - The
o3-minimodel does not support thetemperatureparameter
A new parameterreasoning_effortis available with values:"low","medium","high"
Default is"medium"if not specified
Feb 1
Feature Update:
- Added support for OpenAI audio model input and output
Theapi.aihubmix.compreview server is now available
After one week of stable operation, the main site will be updated
Backend billing is fully consistent with official pricing
Currently, usage logs only display text token usage
Audio token usage is not yet shown in logs but does not affect normal usage
o3-mini,o1
(Billed about 10% higher than official pricing due to limited account availability)aihubmix-DeepSeek-R1(recommended, highly stable)qwen-max-0125(Qwen2.5-Max),sonar-reasoningdeepseek-ai/DeepSeek-R1-Zero,deepseek-ai/DeepSeek-R1,deepseek-r1-distill-llama-70baihub-Phi-4Doubao-1.5-pro-256k,Doubao-1.5-pro-32k,
Doubao-1.5-lite-32k,Doubao-1.5-vision-pro-32ksonar,sonar-pro(latest from Perplexity AI)gemini-2.0-flash-thinking-exp-01-21deepseek-reasoner(aka DeepSeek-R1)MiniMax-Text-01codestral-latest(Mistral’s new code model — Codestral 25.01)
Jan 23
New Models Added:
aihub-Phi-4Doubao-1.5-pro-256k,Doubao-1.5-pro-32k,
Doubao-1.5-lite-32k,Doubao-1.5-vision-pro-32ksonar,sonar-pro(latest from Perplexity AI)gemini-2.0-flash-thinking-exp-01-21deepseek-reasoner(aka DeepSeek-R1)
Jan 19
- Added Perplexity AI API models
Currently supported only on the preview serverapi.aihubmix.com
After stable testing, it will be rolled out to the main serveraihubmix.com api.aihubmix.comis the preview server
New features will be deployed there first and promoted to the main server after ~1 week of stability testing
MiniMax-Text-01codestral-latest(Mistral Codestral 25.01)gpt-4o-zh
Automatically translates any input language to English before inference,
and automatically translates the model output back to Chinese
(This feature is in testing and only supportsgpt-4o; high concurrency is not supported)
Jan 6
- Added
gemini-2.0-flash-exp-search, supporting native Google online search
The official Gemini 2.0 Flash model requires additional parameters for online search
Aihubmix has integrated this functionality — simply appendsearchto the model name - Added model:
deepseek-ai/DeepSeek-V3
Jan 1
- Launched the new Model Marketplace page to replace the old Model & Pricing page
2024
Dec 30
- Fixed the issue where
gemini-2.0-flash-thinking-exp-1219only returned reasoning without final answers - Fixed the issue of balance reminder emails not being delivered
Dec 22
- Added Usage Statistics page
- Added Recharge History page
- Added Doubao series models:
Doubao-lite-128k,Doubao-lite-32k,Doubao-lite-4k,
Doubao-pro-128k,Doubao-pro-256k,Doubao-pro-32k,Doubao-pro-4k - Added model:
gemini-2.0-flash-thinking-exp-1219 - Added models:
gemini-2.0-flash-exp,aihubmix-Mistral-Large-2411,
aihubmix-Llama-3-3-70B-Instruct,grok-2-1212,grok-2-vision-1212 - Added models:
gemini-exp-1206,llama-3.3-70b-versatile,learnlm-1.5-pro-experimental
Dec 14
- Added models:
gemini-2.0-flash-exp,aihubmix-Mistral-Large-2411,
aihubmix-Llama-3-3-70B-Instruct
Dec 8
- Added models:
gemini-exp-1206,llama-3.3-70b-versatile,learnlm-1.5-pro-experimental - Added Usage Statistics page
Nov 21
- Recently added models:
gpt-4o-2024-11-20,step-2-16k,grok-vision-beta - Qwen 2.5 Turbo million-context model:
qwen-turbo-2024-11-01
Nov 7
- Added compatibility with the native Claude SDK
Thev1/messagesendpoint is now live - Native Claude prompt caching and computer use features are not yet supported
These will be completed within the next two weeks
Nov 5
- Added model:
claude-3-5-haiku-20241022 - Added Elon Musk’s xAI latest model:
grok-beta
Oct 23
- Added model:
claude-3-5-sonnet-20241022
Oct 10
-
OpenAI’s latest caching feature is now live
Currently supported models:- GPT-4o
- GPT-4o-mini
- o1-preview
- o1-mini
-
Note:
gpt-4o-2024-05-13is not included in the official supported list - Cache hit tokens will be visible in backend logs when a request hits the cache
- For full details and usage rules, refer to the OpenAI official documentation
Oct 3
- Backend billing for
gpt-4ohas been reduced to match official pricing - Added models:
aihubmix-Llama-3-2-90B-Vision,aihubmix-Llama-3-70B-Instruct - Added Cohere latest models:
aihubmix-command-r-08-2024,aihubmix-command-r-plus-08-2024
Sep 19
- Added models:
whisper-large-v3anddistil-whisper-large-v3-en - Note: Whisper model billing is based on input seconds
The current pricing display on the site is incorrect and will be fixed
Backend billing forwhisper-1fully matches OpenAI official pricing
Sep 13
- Added models:
o1-miniando1-preview
Note: These models require updated parameters
Some client shells may throw errors if defaults are not updated
o1 model does NOT support:systemfield → 400 errortoolsfield → 400 error- Image input → 400 error
json_objectoutput → 500 errorstructuredoutput → 400 errorlogprobsoutput → 403 errorstreamoutput → 400 error
- o1 series: 20 RPM, 150,000,000 TPM — extremely low, frequent 429 errors possible
temperature,top_p, andnare fixed at 1presence_penaltyandfrequency_penaltyare fixed at 0
Sep 10
- Added model:
mattshumer/Reflection-Llama-3.1-70B
(Reported to be one of the strongest fine-tuned versions of LLaMA 3.1 70B) - Claude-3 model prices increased
To ensure stable supply, calls through this platform are currently ~10% more expensive than direct official usage - Increased concurrency capacity for OpenAI models
The system now theoretically supports near-unlimited concurrency
Aug 11
- Added models:
Phi3medium128k,ahm-Phi-3-medium-4k,ahm-Phi-3-small-128k - Improved stability for LLaMA-related models
- Further optimized compatibility for Claude models
Aug 7
- Added OpenAI’s newly released
gpt-4o-2024-08-06
See: https://platform.openai.com/docs/guides/structured-outputs - Added Google’s latest model:
gemini-1.5-pro-exp-0801
Aug 4
- Added direct online payment for account top-ups
- Fixed Claude multi-turn conversation format error:
messages: roles must alternate between "user" and "assistant", but found multiple "user" roles in a row - Optimized index handling when using function calling with Claude models
- The backup server
https://orisound.cnwill be fully decommissioned on Sep 7
Please migrate to the main serverhttps://aihubmix.comor backup serverhttps://api.aihubmix.com
Jul 27
- Added support for Mistral Large 2
Model name:Mistral-large-2407oraihubmix-Mistral-large-2407 - System optimizations
Jul 24
- Added latest LLaMA 3.1 models:
llama-3.1-405b-instruct,llama-3.1-70b-versatile,llama-3.1-8b-instant
Jul 20
- Fixed pricing calculation issues for the
gpt-4o-minimodel- Text input pricing: 1/33 of GPT-4o officially
- Image input pricing: equal to GPT-4o
- To align with official pricing, image token counts for
gpt-4o-miniare multiplied by 33 during billing - Refer to OpenAI official pricing for details
Jul 19
- Added support for the
gpt-4o-minimodel
Backend billing is fully aligned with official pricing
Jul 15 Announcement
- Added support for the official
include_usageAPI parameter
This allows returning usage data in stream mode
See the official documentation for details
Jul 14 Announcement
- The new version of NextWeb now supports calling non-OpenAI models through this platform
- Added backend billing support for Alibaba Qwen models
Calls through this platform cost ~10% more than direct Alibaba Cloud usage - Improved Azure OpenAI output compatibility with the standard OpenAI API
- Added tool calling support for Claude-3
- Added many new models (see Settings → Available Models)
Jul 3
- Overall backend UI optimization
- Each log entry now displays the model unit price at the time of the request
- Added the Model & Pricing page
Jun 20
- The latest
claude-3-5-sonnet-20240620is now supported
See the guide for calling non-OpenAI models on this platform
Jun 18
- Backend logs now support downloading historical request records
Jun 16
- The probability of randomly routing requests to Azure OpenAI has been significantly reduced
Jun 13
- Reduced backend costs for Claude-3 models
(Claude 3 Haiku,Claude 3 Sonnet,Claude 3 Opus)
Backend billing now matches official pricing
As a result, the effective retail API cost on this site is equivalent to ~86% of official pricing
Jun 10
- Completed a major infrastructure upgrade
All servers and data have been migrated to Microsoft Azure - Future development will be based on the open-source OneAPI project with deep secondary optimization
(A commercial license was already obtained via sponsorship) - Due to extremely large log volume (over 100 million records), historical logs were not migrated
Please contact support if you need access to legacy logs - Optimized GPT-4o token billing
Tokenizer changed fromcl100k_basetoo200k_base
As a result, streaming token counts for Chinese, Korean, and Japanese are lower than before
Jun 8
- Added Alibaba’s latest open-source Qinwen 2 models:
alibaba/Qwen2-7B-Instructalibaba/Qwen2-57B-A14B-Instructalibaba/Qwen2-72B-Instruct
May 20
- Added model:
gemini-1.5-flash - Added model:
gpt-4o - Users in Jiangsu may encounter errors on the recharge page due to telecom DNS hijacking
Please contact customer support for assistance - Added models:
llama3-70b-8192,llama3-8b-8192,
gemini-1.5-pro,command-r,command-r-plus - Claude-3 model supply has been restored
Endpoints are currently deployed across AWS and Google Cloud - To cover infrastructure and operational costs, Claude-3 backend billing is ~10% higher than official pricing
With increased usage, this will be gradually reduced to ~5% or lower - Concurrency limits are currently under testing and will be increased as demand grows