Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.aihubmix.com/llms.txt

Use this file to discover all available pages before exploring further.

LiteLLM Overview

LiteLLM is an open-source unified AI gateway developed by BerriAI. It provides a single standardized interface to call almost every major LLM on the market. Repository: https://github.com/BerriAI/litellm
Image
Every LLM provider ships its own SDK and API format — OpenAI, Anthropic, and Google all differ. Switching models or using multiple models at once means maintaining separate codebases. LiteLLM solves this: write once, change one parameter, call any model.

Two Usage Modes

ModeDescriptionBest For
Python SDKpip install litellm, call directly in codePersonal projects, rapid prototyping
Proxy ServerStandalone deployable AI gatewayTeam sharing, enterprise access control

Core Capabilities

  • Unified OpenAI format: supports 100+ providers including OpenAI, Anthropic, Gemini, Bedrock, Azure, and more
  • Virtual key management: centrally manage team API keys without exposing the originals
  • Cost tracking: monitor token usage and spend per user or project
  • Load balancing: automatic traffic distribution across models with failover support
  • High performance: P95 latency of ~8ms at 1,000 RPS

Installation

Requirements

Python 3.8+ macOS Install via Homebrew:
brew install python
Verify:
python3 --version
Windows Download the installer from python.org/downloads. During installation, check “Add Python to PATH”. Verify:
python --version
Linux (Ubuntu/Debian)
sudo apt update
sudo apt install python3 python3-pip

pip

pip is usually bundled with Python. Verify it is available:
pip --version
# or
pip3 --version
If not found, install manually:
# Universal method
python3 -m ensurepip --upgrade

# Ubuntu/Debian
sudo apt install python3-pip

# Upgrade to latest
pip install --upgrade pip

Install LiteLLM

Once your environment is ready:
python3 -m pip install litellm
Verify the installation:
python3 -m pip show litellm

Optional Dependencies

Some providers require additional packages:
# AWS Bedrock
pip install litellm[bedrock]

# Google Vertex AI
pip install litellm[vertex]

# All dependencies (not recommended for production)
pip install litellm[all]

Install Proxy Server

To deploy a standalone gateway:
pip install 'litellm[proxy]'

Docker (Optional)

docker pull ghcr.io/berriai/litellm:main-latest
Recommendation: use pip install litellm for personal development; choose Proxy + Docker for team deployments.

Configure API Key and Make Your First Call

Get Your AiHubMix API Key

Go to the aihubmix.com dashboard and create an API key.

Set the Environment Variable

export AIHUBMIX_API_KEY="your-aihubmix-key"

First Call

import os
from litellm import completion

response = completion(
    model="openai/gpt-4o-mini",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Hello, introduce yourself"}]
)

print(response.choices[0].message.content)

Basic Usage

1. Switching Models

AiHubMix supports all major models. Switching only requires changing the model parameter:
import os
from litellm import completion

response = completion(
    model="openai/claude-sonnet-4-6",  # change this
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Hello, introduce yourself"}]
)

print(response.choices[0].message.content)

2. Streaming

Add stream=True to receive output token by token:
import os
from litellm import completion

response = completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Explain Python in 100 words"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)
print()

3. Multi-Turn Conversation

Pass the conversation history in the messages list so the model remembers context:
import os
from litellm import completion

messages = [
    {"role": "user", "content": "My name is Alex"},
    {"role": "assistant", "content": "Hello, Alex!"},
    {"role": "user", "content": "What is my name?"}
]

response = completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=messages
)

print(response.choices[0].message.content)

4. Async Calls

Send multiple requests concurrently without waiting for each to finish:
import os
import asyncio
from litellm import acompletion

async def ask(question):
    response = await acompletion(
        model="openai/claude-sonnet-4-6",
        api_base="https://aihubmix.com/v1",
        api_key=os.environ.get("AIHUBMIX_API_KEY"),
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

async def main():
    questions = [
        "What color is an apple?",
        "What color is the sky?",
        "What color is grass?"
    ]
    results = await asyncio.gather(*[ask(q) for q in questions])
    for q, r in zip(questions, results):
        print(f"Q: {q}")
        print(f"A: {r}")
        print()

asyncio.run(main())

5. Timeout and Retry

Prevent requests from hanging or failing due to network issues:
import os
from litellm import completion

response = completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Hello"}],
    timeout=10,      # raise an error after 10 seconds
    num_retries=3    # retry up to 3 times on failure
)

print(response.choices[0].message.content)
timeout is in seconds. Set num_retries to 2-3; higher values slow down responses.

6. Token Usage and Cost Tracking

Every response includes token usage data:
import os
from litellm import completion

response = completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Explain Python in 100 words"}]
)

print(response.choices[0].message.content)
print()
print("Token usage:")
print(f"  Input:  {response.usage.prompt_tokens}")
print(f"  Output: {response.usage.completion_tokens}")
print(f"  Total:  {response.usage.total_tokens}")
Track cost per call:
import os
from litellm import completion, completion_cost

response = completion(
    model="openai/claude-sonnet-4-6",
    api_base="https://aihubmix.com/v1",
    api_key=os.environ.get("AIHUBMIX_API_KEY"),
    messages=[{"role": "user", "content": "Explain Python in 100 words"}]
)

cost = completion_cost(completion_response=response)
print(f"Cost: ${cost:.6f}")

7. Load Balancing and Failover

Configure multiple models to automatically distribute traffic or switch to a backup when one fails:
import os
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "my-model",
            "litellm_params": {
                "model": "openai/claude-sonnet-4-6",
                "api_base": "https://aihubmix.com/v1",
                "api_key": os.environ.get("AIHUBMIX_API_KEY"),
            }
        },
        {
            "model_name": "my-model",
            "litellm_params": {
                "model": "openai/gpt-4o",
                "api_base": "https://aihubmix.com/v1",
                "api_key": os.environ.get("AIHUBMIX_API_KEY"),
            }
        }
    ]
)

response = router.completion(
    model="my-model",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response.choices[0].message.content)
Both models share the same model_name. LiteLLM round-robins between them and automatically fails over if one returns an error.

8. Deploy Proxy Server

The Proxy Server is a standalone gateway. Team members route all requests through it without needing their own API keys. Install
python3 -m pip install 'litellm[proxy]'
Create config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_base: https://aihubmix.com/v1
      api_key: os.environ/AIHUBMIX_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: openai/claude-sonnet-4-6
      api_base: https://aihubmix.com/v1
      api_key: os.environ/AIHUBMIX_API_KEY

  - model_name: gemini-flash
    litellm_params:
      model: openai/gemini-2.0-flash
      api_base: https://aihubmix.com/v1
      api_key: os.environ/AIHUBMIX_API_KEY
Start the server
litellm --config config.yaml --port 4000
A successful start shows:
LiteLLM: Proxy running on http://0.0.0.0:4000
Call the local server
import os
from litellm import completion

response = completion(
    model="gpt-4o",
    api_base="http://localhost:4000",
    api_key="any-string",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response.choices[0].message.content)
The api_key here can be any string. The real AiHubMix key is managed by the Proxy.

9. Virtual Key Management

Virtual keys let you assign independent keys to different team members or projects, controlling access and usage without exposing the real AiHubMix key. Prerequisites: start a PostgreSQL instance
docker run -d \
  --name litellm-db \
  -e POSTGRES_USER=litellm \
  -e POSTGRES_PASSWORD=litellm \
  -e POSTGRES_DB=litellm \
  -p 5432:5432 \
  postgres
Update config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_base: https://aihubmix.com/v1
      api_key: os.environ/AIHUBMIX_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: openai/claude-sonnet-4-6
      api_base: https://aihubmix.com/v1
      api_key: os.environ/AIHUBMIX_API_KEY

general_settings:
  master_key: sk-my-master-key
  database_url: postgresql://litellm:litellm@localhost:5432/litellm
Restart the server
litellm --config config.yaml --port 4000
Create a virtual key
curl -X POST http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-my-master-key" \
  -H "Content-Type: application/json" \
  -d '{
    "key_alias": "team-a",
    "max_budget": 10,
    "models": ["gpt-4o", "claude-sonnet"]
  }'
The key field in the response is the virtual key, e.g. sk-xxxxxx. Use the virtual key
from litellm import completion

response = completion(
    model="claude-sonnet",
    api_base="http://localhost:4000",
    api_key="sk-xxxxxx",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response.choices[0].message.content)
Check usage
curl http://localhost:4000/key/info \
  -H "Authorization: Bearer sk-my-master-key" \
  -H "Content-Type: application/json" \
  -d '{"key": "sk-xxxxxx"}'
Each virtual key supports individual model restrictions, budget limits, and expiry times — ideal for multi-member team workflows.

Practical Example: Multi-Model Comparison

Send the same question to multiple models simultaneously and compare output quality, speed, and token usage. Set API Key
export AIHUBMIX_API_KEY="your-key"
Run the comparison
import os
import time
import asyncio
from litellm import acompletion

MODELS = [
    "gpt-5.5",
    "claude-opus-4-7",
    "deepseek-v4-flash",
    "coding-glm-5.1-free",
]

QUESTION = "If you could give a programmer only one piece of advice, what would it be?"

async def ask_model(model, question):
    start = time.time()
    try:
        response = await acompletion(
            model=f"openai/{model}",
            api_base="https://aihubmix.com/v1",
            api_key=os.environ.get("AIHUBMIX_API_KEY"),
            messages=[{"role": "user", "content": question}]
        )
        return {
            "model": model,
            "answer": response.choices[0].message.content.strip(),
            "tokens": response.usage.total_tokens,
            "time": round(time.time() - start, 2),
            "error": None
        }
    except Exception as e:
        return {
            "model": model,
            "answer": None,
            "tokens": 0,
            "time": round(time.time() - start, 2),
            "error": str(e)
        }

async def main():
    print(f"Question: {QUESTION}")
    print("=" * 60)
    tasks = [ask_model(m, QUESTION) for m in MODELS]
    results = await asyncio.gather(*tasks)
    for r in results:
        print(f"\nModel: {r['model']}")
        print(f"Time: {r['time']}s  |  Tokens: {r['tokens']}")
        print("-" * 40)
        if r["error"]:
            print(f"Error: {r['error']}")
        else:
            print(r["answer"])
    print("\n" + "=" * 60)
    print(f"{'Model':<30} {'Time':>8} {'Tokens':>8}")
    print("-" * 50)
    for r in sorted(results, key=lambda x: x["time"]):
        status = f"{r['time']}s" if not r["error"] else "failed"
        print(f"{r['model']:<30} {status:>8} {r['tokens']:>8}")

asyncio.run(main())
Image
Last updated: April 29, 2026