Documentation Index
Fetch the complete documentation index at: https://docs.aihubmix.com/llms.txt
Use this file to discover all available pages before exploring further.
LiteLLM 概覽
LiteLLM 是由 BerriAI 開發的開源統一 AI 閘道。它提供單一標準化介面以呼叫市面上幾乎所有主要 LLM。 儲存庫:https://github.com/BerriAI/litellm
每個 LLM 供應商都有自己的 SDK 與 API 格式 — OpenAI、Anthropic、Google 都不一樣。切換模型或同時使用多個模型,意謂著要維護不同的程式碼庫。LiteLLM 解決了這個問題:寫一次,改一個參數,呼叫任何模型。
兩種使用模式
| 模式 | 說明 | 適合對象 |
|---|
| Python SDK | pip install litellm,直接在程式碼中呼叫 | 個人專案、快速原型 |
| Proxy Server | 獨立部署的 AI 閘道 | 團隊共用、企業存取控制 |
核心能力
- 統一的 OpenAI 格式:支援 OpenAI、Anthropic、Gemini、Bedrock、Azure 等 100+ 供應商
- 虛擬 key 管理:集中管理團隊 API key,不需要暴露原始 key
- 成本追蹤:依使用者或專案監控 token 用量與花費
- 負載平衡:跨模型自動分配流量,支援容錯切換
- 高效能:1,000 RPS 下 P95 延遲約 8ms
系統需求
Python 3.8+
macOS
透過 Homebrew 安裝:
驗證:
Windows
從 python.org/downloads 下載安裝程式。安裝過程中,請勾選 「Add Python to PATH」。
驗證:
Linux(Ubuntu/Debian)
sudo apt update
sudo apt install python3 python3-pip
pip
pip 通常與 Python 一同附帶。驗證是否可用:
pip --version
# or
pip3 --version
如找不到,請手動安裝:
# Universal method
python3 -m ensurepip --upgrade
# Ubuntu/Debian
sudo apt install python3-pip
# Upgrade to latest
pip install --upgrade pip
安裝 LiteLLM
環境準備好之後:
python3 -m pip install litellm
驗證安裝:
python3 -m pip show litellm
選用相依套件
某些供應商需要額外的套件:
# AWS Bedrock
pip install litellm[bedrock]
# Google Vertex AI
pip install litellm[vertex]
# All dependencies (not recommended for production)
pip install litellm[all]
安裝 Proxy Server
如要部署獨立閘道:
pip install 'litellm[proxy]'
Docker(選用)
docker pull ghcr.io/berriai/litellm:main-latest
建議:個人開發使用 pip install litellm;團隊部署選擇 Proxy + Docker。
設定 API Key 並進行第一次呼叫
取得您的 AiHubMix API Key
前往 aihubmix.com 儀表板並建立 API key。
設定環境變數
export AIHUBMIX_API_KEY="your-aihubmix-key"
第一次呼叫
import os
from litellm import completion
response = completion(
model="openai/gpt-4o-mini",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Hello, introduce yourself"}]
)
print(response.choices[0].message.content)
基本用法
1. 切換模型
AiHubMix 支援所有主流模型。切換只需更改 model 參數:
import os
from litellm import completion
response = completion(
model="openai/claude-sonnet-4-6", # change this
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Hello, introduce yourself"}]
)
print(response.choices[0].message.content)
2. 串流輸出
加上 stream=True 即可逐 token 接收輸出:
import os
from litellm import completion
response = completion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Explain Python in 100 words"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="", flush=True)
print()
3. 多輪對話
在 messages 清單中傳入對話歷史,讓模型記住上下文:
import os
from litellm import completion
messages = [
{"role": "user", "content": "My name is Alex"},
{"role": "assistant", "content": "Hello, Alex!"},
{"role": "user", "content": "What is my name?"}
]
response = completion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=messages
)
print(response.choices[0].message.content)
4. 非同步呼叫
同時傳送多個請求而無需逐一等待:
import os
import asyncio
from litellm import acompletion
async def ask(question):
response = await acompletion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
async def main():
questions = [
"What color is an apple?",
"What color is the sky?",
"What color is grass?"
]
results = await asyncio.gather(*[ask(q) for q in questions])
for q, r in zip(questions, results):
print(f"Q: {q}")
print(f"A: {r}")
print()
asyncio.run(main())
5. 逾時與重試
避免請求因網路問題而懸停或失敗:
import os
from litellm import completion
response = completion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Hello"}],
timeout=10, # raise an error after 10 seconds
num_retries=3 # retry up to 3 times on failure
)
print(response.choices[0].message.content)
timeout 以秒為單位。num_retries 建議設為 2-3;過高的值會拖慢回應速度。
6. Token 使用量與成本追蹤
每次回應都包含 token 使用資料:
import os
from litellm import completion
response = completion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Explain Python in 100 words"}]
)
print(response.choices[0].message.content)
print()
print("Token usage:")
print(f" Input: {response.usage.prompt_tokens}")
print(f" Output: {response.usage.completion_tokens}")
print(f" Total: {response.usage.total_tokens}")
追蹤每次呼叫的成本:
import os
from litellm import completion, completion_cost
response = completion(
model="openai/claude-sonnet-4-6",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": "Explain Python in 100 words"}]
)
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")
7. 負載平衡與容錯切換
設定多個模型可自動分配流量,並在其中之一失敗時切換到備援:
import os
from litellm import Router
router = Router(
model_list=[
{
"model_name": "my-model",
"litellm_params": {
"model": "openai/claude-sonnet-4-6",
"api_base": "https://aihubmix.com/v1",
"api_key": os.environ.get("AIHUBMIX_API_KEY"),
}
},
{
"model_name": "my-model",
"litellm_params": {
"model": "openai/gpt-4o",
"api_base": "https://aihubmix.com/v1",
"api_key": os.environ.get("AIHUBMIX_API_KEY"),
}
}
]
)
response = router.completion(
model="my-model",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
兩個模型共用同一個 model_name。LiteLLM 會在它們之間輪詢,並在其中之一傳回錯誤時自動容錯切換。
8. 部署 Proxy Server
Proxy Server 是獨立的閘道。團隊成員透過它路由所有請求,而無需自己的 API key。
安裝
python3 -m pip install 'litellm[proxy]'
建立 config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_base: https://aihubmix.com/v1
api_key: os.environ/AIHUBMIX_API_KEY
- model_name: claude-sonnet
litellm_params:
model: openai/claude-sonnet-4-6
api_base: https://aihubmix.com/v1
api_key: os.environ/AIHUBMIX_API_KEY
- model_name: gemini-flash
litellm_params:
model: openai/gemini-2.0-flash
api_base: https://aihubmix.com/v1
api_key: os.environ/AIHUBMIX_API_KEY
啟動伺服器
litellm --config config.yaml --port 4000
啟動成功會顯示:
LiteLLM: Proxy running on http://0.0.0.0:4000
呼叫本機伺服器
import os
from litellm import completion
response = completion(
model="gpt-4o",
api_base="http://localhost:4000",
api_key="any-string",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
此處的 api_key 可以是任意字串。真實的 AiHubMix key 由 Proxy 管理。
9. 虛擬 Key 管理
虛擬 key 讓您可以為不同團隊成員或專案指派獨立的 key,在不暴露真實 AiHubMix key 的情況下控制存取與用量。
前置需求:啟動 PostgreSQL 執行個體
docker run -d \
--name litellm-db \
-e POSTGRES_USER=litellm \
-e POSTGRES_PASSWORD=litellm \
-e POSTGRES_DB=litellm \
-p 5432:5432 \
postgres
更新 config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_base: https://aihubmix.com/v1
api_key: os.environ/AIHUBMIX_API_KEY
- model_name: claude-sonnet
litellm_params:
model: openai/claude-sonnet-4-6
api_base: https://aihubmix.com/v1
api_key: os.environ/AIHUBMIX_API_KEY
general_settings:
master_key: sk-my-master-key
database_url: postgresql://litellm:litellm@localhost:5432/litellm
重新啟動伺服器
litellm --config config.yaml --port 4000
建立虛擬 key
curl -X POST http://localhost:4000/key/generate \
-H "Authorization: Bearer sk-my-master-key" \
-H "Content-Type: application/json" \
-d '{
"key_alias": "team-a",
"max_budget": 10,
"models": ["gpt-4o", "claude-sonnet"]
}'
回應中的 key 欄位就是虛擬 key,例如 sk-xxxxxx。
使用虛擬 key
from litellm import completion
response = completion(
model="claude-sonnet",
api_base="http://localhost:4000",
api_key="sk-xxxxxx",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
檢視使用情況
curl http://localhost:4000/key/info \
-H "Authorization: Bearer sk-my-master-key" \
-H "Content-Type: application/json" \
-d '{"key": "sk-xxxxxx"}'
每個虛擬 key 都支援獨立的模型限制、預算上限與到期時間 — 非常適合多成員的團隊工作流程。
實戰範例:多模型比較
同時將相同問題傳送至多個模型,並比較輸出品質、速度與 token 用量。
設定 API Key
export AIHUBMIX_API_KEY="your-key"
執行比較
import os
import time
import asyncio
from litellm import acompletion
MODELS = [
"gpt-5.5",
"claude-opus-4-7",
"deepseek-v4-flash",
"coding-glm-5.1-free",
]
QUESTION = "If you could give a programmer only one piece of advice, what would it be?"
async def ask_model(model, question):
start = time.time()
try:
response = await acompletion(
model=f"openai/{model}",
api_base="https://aihubmix.com/v1",
api_key=os.environ.get("AIHUBMIX_API_KEY"),
messages=[{"role": "user", "content": question}]
)
return {
"model": model,
"answer": response.choices[0].message.content.strip(),
"tokens": response.usage.total_tokens,
"time": round(time.time() - start, 2),
"error": None
}
except Exception as e:
return {
"model": model,
"answer": None,
"tokens": 0,
"time": round(time.time() - start, 2),
"error": str(e)
}
async def main():
print(f"Question: {QUESTION}")
print("=" * 60)
tasks = [ask_model(m, QUESTION) for m in MODELS]
results = await asyncio.gather(*tasks)
for r in results:
print(f"\nModel: {r['model']}")
print(f"Time: {r['time']}s | Tokens: {r['tokens']}")
print("-" * 40)
if r["error"]:
print(f"Error: {r['error']}")
else:
print(r["answer"])
print("\n" + "=" * 60)
print(f"{'Model':<30} {'Time':>8} {'Tokens':>8}")
print("-" * 50)
for r in sorted(results, key=lambda x: x["time"]):
status = f"{r['time']}s" if not r["error"] else "failed"
print(f"{r['model']:<30} {status:>8} {r['tokens']:>8}")
asyncio.run(main())
最後更新:2026 年 4 月 29 日