What Is Prompt Caching
Prompt caching allows API calls to reuse previously processed context, avoiding repeated billing for identical long prompts. Both Anthropic and DeepSeek support this mechanism, but their implementations and billing logic differ.
Anthropic Prompt Caching
Anthropic’s prompt caching is controlled via the cache_control parameter. Writing to cache incurs an additional fee the first time, while subsequent cache hits are billed at a much lower rate.
Billing Structure
| Billing Item | Price (per 1M tokens) |
|---|---|
| Base Input | $3.00 / ¥21.60 |
| Cache Hit Input | $0.30 / ¥2.16 |
| Output | $15.00 / ¥108.00 |
| Cache Creation (one-time) | $3.75 / ¥27.00 |
When It Pays Off
Assume you send a 50K token system prompt with every request, calling 1,000 times per day:
- Without caching: 50K × 1,000 × $3.00/1M = $150/day
- With caching: Cache creation $3.75 + 50K × 1,000 × $0.30/1M = $18.75/day
That’s a 87.5% reduction.
DeepSeek Cache Mechanism
DeepSeek’s caching is simpler: cached and non-cached inputs are billed separately, with no additional creation fee.
| Billing Item | Price (per 1M tokens) |
|---|---|
| Input (miss) | $0.14 / ¥1.01 |
| Input (hit) | $0.014 / ¥0.10 |
| Output | $0.28 / ¥2.02 |
DeepSeek’s cache-hit price is just 1/10 of the miss price—an even larger gap than Anthropic’s.
Real-World Scenario
Using DeepSeek-V4-Pro, 10,000 requests/day at 20K tokens each:
| Scenario | Daily Cost |
|---|---|
| All misses | ¥202 |
| 50% hit rate | ¥106 |
| 90% hit rate | ¥30.2 |
How to Verify with the Calculator
- Open the Text Model Calculator
- Select the model (Claude Sonnet 4.6 for Anthropic, V4 Pro for DeepSeek)
- Enter tokens processed for the first time in “Cache Miss”
- Enter tokens read from cache in “Cache Hit”
- Enter output token count
- Switch to CNY to see costs in RMB
Click ”+ Add Model” to compare cache savings across multiple models side by side. To turn hit rate into a launch budget range, continue with how cache hit rate changes AI API cost.
Tips to Improve Cache Hit Rate
- Keep system prompts static — Caching matches prefixes; changes to system prompts invalidate the cache
- Put immutable content first — Cache typically matches continuous segments at the prompt start
- Avoid dynamic timestamps — Per-request timestamp differences prevent cache hits
- Batch similar requests — Identical prompts sent close together are more likely to stay cached
Conclusion
Prompt caching is worth implementing if:
- Your requests carry substantial fixed context (>10K tokens)
- You have high request frequency (hundreds+ per day)
- Your prompt structure is relatively stable
Use the text model calculator with your actual numbers to see how much you could save. For RAG or Agent workloads, combine this with RAG chatbot cost estimation or AI Agent cost planning.