Core Mechanism
Automatic and Explicit Caching
Different model providers have varying support for caching:- Automatic Caching
- No additional configuration required
- The system automatically identifies and caches reusable content
- Applicable to models like OpenAI, DeepSeek, Gemini, etc.
- Explicit Caching
- Requires manual specification of the cache location via
cache_control - Allows for fine-grained control of caching
- Applicable to models like Anthropic, Gemini (in certain scenarios)
Comparison of Caching Strategies Across Models
OpenAI
- Automatic caching, no configuration needed
- Minimum prompt length: 1024 tokens
- Pricing: Writing to cache is free; reading from cache costs 0.25x to 0.5x the original price
Anthropic Claude
Automatic Caching
- Automatically advances cache boundaries
- Suitable for multi-turn conversations
Explicit Caching Checkpoints
- Up to 4 checkpoints
- Fine control over cached content
Cache Duration
- Default: 5 minutes
- Optional: 1 hour (“ttl”: “1h”)
For more information, please refer to: Claude Prompt Caching
Gemini
Explicit Caching (Recommended)
- Only uses the last checkpoint
- Compatible with Anthropic’s syntax
Implicit Caching
- Automatically effective, no configuration needed
- TTL: Approximately 3-5 minutes
- Minimum tokens: Approximately 4096
DeepSeek / Grok / Moonshot / Groq
- All support automatic caching
- No additional configuration required
- General rule: Writing to cache is free or at the same price, reading from cache is below the original price
Usage Recommendations
- Maintain Stable Prefixes
- Cache Large Texts
- RAG data
- Long texts
- CSV / JSON data
- Role settings
- Control TTL
- Short sessions → 5 minutes
- Long sessions → 1 hour (more cost-effective)
- Reduce Cache Writes