AI Model API Pricing

Compare Claude, GPT, Gemini, DeepSeek, and multimodal AI model API pricing by category, input/output rates, cache costs, and source links.

Prices are planning estimates only. Always verify with official provider dashboards before making budget decisions.

OpenAI

GPT-5.5

Input ¥36.00/1M tokens

Output ¥216.00/1M tokens

Cache read ¥3.60/1M tokens

Standard processing rates for context < 270K. Batch -50%, data residency +10%.

2026-05-28

OpenAI

GPT-5.4

Input ¥18.00/1M tokens

Output ¥108.00/1M tokens

Cache read ¥1.80/1M tokens

Standard processing rates for context < 270K. Batch -50%, data residency +10%.

2026-05-28

OpenAI

GPT-5.4 mini

Input ¥5.40/1M tokens

Output ¥32.40/1M tokens

Cache read ¥0.5400/1M tokens

Standard processing rates for context < 270K. Batch -50%, data residency +10%.

2026-05-28

Anthropic

Claude Opus 4.7

Input ¥36.00/1M tokens

Output ¥180.00/1M tokens

Cache write ¥45.00/1M tokens

Cache read ¥3.60/1M tokens

Steady-state: cache read. First cold request pays cache write. Batch -50%.

2026-05-28

Anthropic

Claude Sonnet 4.6

Input ¥21.60/1M tokens

Output ¥108.00/1M tokens

Cache write ¥27.00/1M tokens

Cache read ¥2.16/1M tokens

Steady-state: cache read. First cold request pays cache write. Batch -50%.

2026-05-28

Anthropic

Claude Haiku 4.5

Input ¥7.20/1M tokens

Output ¥36.00/1M tokens

Cache write ¥9.00/1M tokens

Cache read ¥0.7200/1M tokens

Steady-state: cache read. First cold request pays cache write. Batch -50%.

2026-05-28

DeepSeek

DeepSeek V4 Pro

Input ¥1.01/1M tokens

Output ¥2.02/1M tokens

Cache read ¥0.0202/1M tokens

Standard pricing (not promo). DeepSeek V3→V4 upgrade: off-peak/2.5折 promo not included.

2026-05-28

Mistral

Mistral Medium 3.5

Input ¥10.80/1M tokens

Output ¥54.00/1M tokens

Official Mistral pricing lists Mistral Medium 3.5 at $1.50 input / $7.50 output per 1M tokens with model id mistral-medium-latest.

2026-05-28

Mistral

Mistral Large 3

Input ¥3.60/1M tokens

Output ¥10.80/1M tokens

Official Mistral pricing lists Mistral Large 3 at $0.50 input / $1.50 output per 1M tokens with model id mistral-large-latest.

2026-05-28

Mistral

Mistral Small 4

Input ¥0.7200/1M tokens

Output ¥2.16/1M tokens

Official Mistral pricing lists Mistral Small 4 at $0.10 input / $0.30 output per 1M tokens with model id mistral-small-latest.

2026-05-28

Mistral

Magistral Medium

Input ¥14.40/1M tokens

Output ¥36.00/1M tokens

Official Mistral pricing lists Magistral Medium at $2.00 input / $5.00 output per 1M tokens with model id magistral-medium-latest.

2026-05-28

Mistral

Magistral Small

Input ¥3.60/1M tokens

Output ¥10.80/1M tokens

Official Mistral pricing lists Magistral Small at $0.50 input / $1.50 output per 1M tokens with model id magistral-small-latest.

2026-05-28

Mistral

Devstral 2

Input ¥2.88/1M tokens

Output ¥14.40/1M tokens

Official Mistral pricing lists Devstral 2 at $0.40 input / $2.00 output per 1M tokens with model id devstral-medium-latest.

2026-05-28

Mistral

Devstral Small 2

Input ¥0.7200/1M tokens

Output ¥2.16/1M tokens

Official Mistral pricing lists Devstral Small 2 at $0.10 input / $0.30 output per 1M tokens with model id devstral-small-latest.

2026-05-28

Mistral

Codestral

Input ¥2.16/1M tokens

Output ¥6.48/1M tokens

Official Mistral pricing lists Codestral at $0.30 input / $0.90 output per 1M tokens with model id codestral-latest.

2026-05-28

Mistral

Ministral 3 3B

Input ¥0.7200/1M tokens

Output ¥0.7200/1M tokens

Official Mistral pricing lists Ministral 3 - 3B at $0.10 input / $0.10 output per 1M tokens with model id ministral-3b-latest.

2026-05-28

Mistral

Ministral 3 8B

Input ¥1.08/1M tokens

Output ¥1.08/1M tokens

Official Mistral pricing lists Ministral 3 - 8B at $0.15 input / $0.15 output per 1M tokens with model id ministral-8b-latest.

2026-05-28

MiniMax

MiniMax-M2.7

Input ¥2.10/1M tokens

Output ¥8.40/1M tokens

Cache write ¥2.63/1M tokens

Cache read ¥0.4200/1M tokens

Official MiniMax pay-as-you-go pricing lists MiniMax-M2.7 at ¥2.1 input / ¥8.4 output / ¥0.42 cache read / ¥2.625 cache write per 1M tokens.

2026-05-28

MiniMax

MiniMax-M2.7-highspeed

Input ¥4.20/1M tokens

Output ¥16.80/1M tokens

Cache write ¥2.63/1M tokens

Cache read ¥0.4200/1M tokens

Official MiniMax pay-as-you-go pricing lists MiniMax-M2.7-highspeed at ¥4.2 input / ¥16.8 output / ¥0.42 cache read / ¥2.625 cache write per 1M tokens.

2026-05-28

MiniMax

MiniMax-M2.5

Input ¥2.10/1M tokens

Output ¥8.40/1M tokens

Cache write ¥2.63/1M tokens

Cache read ¥0.2100/1M tokens

Official MiniMax pay-as-you-go pricing lists MiniMax-M2.5 at ¥2.1 input / ¥8.4 output / ¥0.21 cache read / ¥2.625 cache write per 1M tokens.

2026-05-28

MiniMax

MiniMax-M2.5-highspeed

Input ¥4.20/1M tokens

Output ¥16.80/1M tokens

Cache write ¥2.63/1M tokens

Cache read ¥0.2100/1M tokens

Official MiniMax pay-as-you-go pricing lists MiniMax-M2.5-highspeed at ¥4.2 input / ¥16.8 output / ¥0.21 cache read / ¥2.625 cache write per 1M tokens.

2026-05-28

MiniMax

M2-her

Input ¥2.10/1M tokens

Output ¥8.40/1M tokens

Official MiniMax pay-as-you-go pricing lists M2-her at ¥2.1 input / ¥8.4 output per 1M tokens; cache read/write fields are not listed for this row.

2026-05-28

Zhipu

GLM-5.1

Input ¥6.00/1M tokens

Output ¥24.00/1M tokens

Cache read ¥1.30/1M tokens

Official Zhipu API pricing lists GLM-5.1 tiered by input length; the first tier is used as the card default and tiers stores the full official schedule.

2026-05-28

Zhipu

GLM-5-Turbo

Input ¥5.00/1M tokens

Output ¥22.00/1M tokens

Cache read ¥1.20/1M tokens

Official Zhipu API pricing lists GLM-5-Turbo tiered by input length; the first tier is used as the card default and tiers stores the full official schedule.

2026-05-28

Zhipu

GLM-5

Input ¥4.00/1M tokens

Output ¥18.00/1M tokens

Cache read ¥1.00/1M tokens

Official Zhipu API pricing lists GLM-5 tiered by input length; the first tier is used as the card default and tiers stores the full official schedule.

2026-05-28

Qwen

qwen3.7-max

Input ¥12.00/1M tokens

Output ¥36.00/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen3.7-max at ¥12 input / ¥36 output per 1M tokens for the 0<Token≤1M tier.

2026-05-28

Qwen

qwen3-max

Input ¥2.50/1M tokens

Output ¥10.00/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen3-max with tiered token pricing; the first tier is used as the card default and tiers stores the full official schedule.

2026-05-28

Qwen

qwen-turbo

Input ¥0.3000/1M tokens

Output ¥0.6000/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen-turbo at ¥0.30 input / ¥0.60 output per 1M tokens for China mainland deployment.

2026-05-28

Qwen

qwen-plus

Input ¥0.8000/1M tokens

Output ¥2.00/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen-plus at ¥0.80 input / ¥2.00 output per 1M tokens for the 0-128K tier in China mainland deployment.

2026-05-28

Qwen

qwen-max

Input ¥2.40/1M tokens

Output ¥9.60/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen-max at ¥2.40 input / ¥9.60 output per 1M tokens in China mainland deployment.

2026-05-28

Moonshot

kimi-k2.6

Input ¥6.50/1M tokens

Output ¥27.00/1M tokens

Cache read ¥1.10/1M tokens

Official Kimi pricing lists kimi-k2.6 at ¥1.10 cache-hit input / ¥6.50 cache-miss input / ¥27.00 output per 1M tokens.

2026-05-28

Moonshot

kimi-k2.5

Input ¥4.00/1M tokens

Output ¥21.00/1M tokens

Cache read ¥0.7000/1M tokens

Official Kimi pricing lists kimi-k2.5 at ¥0.70 cache-hit input / ¥4.00 cache-miss input / ¥21.00 output per 1M tokens.

2026-05-28

Moonshot

moonshot-v1-32k

Input ¥5.00/1M tokens

Output ¥20.00/1M tokens

Official Moonshot pricing lists moonshot-v1-32k at ¥5.00 input / ¥20.00 output per 1M tokens.

2026-05-28

OpenAI

GPT-Realtime-2

Input ¥230.40/1M tokens

Output ¥460.80/1M tokens

Cache read ¥2.88/1M tokens

Audio token pricing for voice interactions. Text modality billed at separate text rates.

2026-05-28

Google

Gemini 3 Audio

Input ¥21.60/1M tokens

Output ¥86.40/1M tokens

Paid-tier Gemini 3 audio token pricing. Official page also lists per-minute equivalents for some audio workloads.

2026-05-28

Google

Gemini 2.5 Flash Audio

Input ¥3.60/1M tokens

Output ¥10.80/1M tokens

Cache read ¥0.3600/1M tokens

Gemini 2.5 Flash audio input pricing, with output billed at the model output-token rate.

2026-05-28

Zhipu

GLM-4-Voice

Input ¥80.00/1M tokens

Output ¥80.00/1M tokens

Official Zhipu pricing lists GLM-4-Voice audio at ¥80 per 1M Tokens; the card keeps the official token unit without minute conversion.

2026-05-28

OpenAI

GPT-Image-2

Input ¥57.60/1M tokens

Output ¥216.00/1M tokens

Cache read ¥14.40/1M tokens

Image token pricing for GPT-Image generation. Text prompts are billed at separate text-token rates.

2026-05-28

Google

Gemini 3 Image/Video

Input ¥7.20/1M tokens

Output ¥32.40/1M tokens

Paid-tier Gemini 3 image/video input pricing with output billed at the text output-token rate.

2026-05-28

Google

Gemini 2.5 Flash Image/Video

Input ¥1.80/1M tokens

Output ¥10.80/1M tokens

Cache read ¥0.1800/1M tokens

Gemini 2.5 Flash image/video input pricing, with output billed at the model output-token rate.

2026-05-28

Qwen

qwen-vl-ocr

Input ¥0.5140/1M tokens

Output ¥1.17/1M tokens

Official Alibaba Cloud Model Studio pricing lists qwen-vl-ocr vision OCR token pricing; per-image generation rows are not converted into token prices.

2026-05-28

Moonshot

moonshot-v1-8k-vision-preview

Input ¥2.00/1M tokens

Output ¥10.00/1M tokens

Official Moonshot V1 vision preview pricing. Image input is billed as token pricing; no per-image conversion is applied.

2026-05-28

Moonshot

moonshot-v1-32k-vision-preview

Input ¥5.00/1M tokens

Output ¥20.00/1M tokens

Official Moonshot V1 vision preview pricing. Image input is billed as token pricing; no per-image conversion is applied.

2026-05-28

Moonshot

moonshot-v1-128k-vision-preview

Input ¥10.00/1M tokens

Output ¥30.00/1M tokens

Official Moonshot V1 vision preview pricing. Image input is billed as token pricing; no per-image conversion is applied.

2026-05-28

Google

Gemini 3 Image/Video

Input ¥7.20/1M tokens

Output ¥32.40/1M tokens

Paid-tier Gemini 3 video input pricing. Output is billed at the model text output-token rate; no seconds or minutes conversion is applied.

2026-05-28

Google

Gemini 2.5 Flash Image/Video

Input ¥1.80/1M tokens

Output ¥10.80/1M tokens

Cache read ¥0.1800/1M tokens

Gemini 2.5 Flash video input pricing with context cache reads where supported. Output is billed at the model output-token rate.

2026-05-28

Moonshot

kimi-k2.6-video-input

Input ¥6.50/1M tokens

Output ¥27.00/1M tokens

Cache read ¥1.10/1M tokens

Official Kimi K2.6 supports video input under token pricing. No seconds or minutes conversion is applied.

2026-05-28

Moonshot

kimi-k2.5-video-input

Input ¥4.00/1M tokens

Output ¥21.00/1M tokens

Cache read ¥0.7000/1M tokens

Official Kimi K2.5 supports video input under token pricing. No seconds or minutes conversion is applied.

2026-05-28

AI Model API Pricing

API pricing intelligence and budget control

AI API bill audit and variance checks

Monthly API cost budget modeling

Pre-launch API cost risk checklist