DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
這份候選名單適合什麼用途?
Cheap LLM API selection should start with workload shape, not only the lowest posted rate. For classification, summarization, routing, support drafts, and batch transformations, a lower-cost model can reduce monthly spend without changing your application interface. For final answers, complex reasoning, or coding agents, teams should benchmark a low-cost model against a stronger fallback. NextModel keeps price, context, capability, provider source, and code examples in one place so developers can make that tradeoff before deployment.
來源依據: NextModel curated catalog, provider public pricing, and OpenRouter metadata when available.
综合价格
推薦候選 cheap llm api
先從候選名單開始,再以真實提示詞測試,並在接入正式環境路由前比較月度成本。
Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...
GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
比較表
按價格、提供者、上下文、能力與來源比較這份候選名單。
當你在縮小正式環境候選名單、建立備援策略或比較模型經濟性時,可使用此視圖。
| Model | Provider | Input | Output | Context | Capabilities | Best for | Latency | Status | Source |
|---|---|---|---|---|---|---|---|---|---|
| DeepSeek: DeepSeek V4 Flashdeepseek/deepseek-v4-flash | DeepSeek | $0.112 / 1M tokens | $0.224 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | low-cost Chinese tasks, long-context summary | 800-2600ms | Catalog | OpenRouter if available |
| Mistral: Mistral Small 3.2 24Bmistralai/mistral-small-3.2-24b-instruct | Mistral AI | $0.1 / 1M tokens | $0.3 / 1M tokens | 128k | Tool callingJSON modeStreamingLow cost | translation, classification | 700-2300ms | Catalog | OpenRouter if available |
| OpenAI: GPT-4o-miniopenai/gpt-4o-mini | OpenRouter | $0.15 / 1M tokens | $0.6 / 1M tokens | 128k | Tool callingVisionJSON modeLong context | low-cost chat, image understanding | 800-2400ms | Catalog | OpenRouter if available |
| Meta: Llama 4 Maverickmeta-llama/llama-4-maverick | Meta | $0.15 / 1M tokens | $0.6 / 1M tokens | 1M | JSON modeLong contextStreamingLow cost | open-model workflows, cost-sensitive long context | 950-2800ms | Catalog | OpenRouter if available |
| Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash | $0.3 / 1M tokens | $2.50 / 1M tokens | 1M | Tool callingVisionJSON modeLong context | long-document summarization, image Q&A | 900-2800ms | Catalog | OpenRouter if available | |
| MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 | Moonshot AI | $0.73 / 1M tokens | $3.49 / 1M tokens | 262.1k | JSON modeLong contextStreamingTool calling | long Chinese documents, contract review | 1400-4400ms | Catalog | OpenRouter if available |
常見問題
Cheap LLM API 常見問題
What is the cheapest model in this catalog?
The cheapest option depends on currency conversion and output length. Doubao Seed 2.0 Mini is the lowest-cost CNY production entry in this catalog.
Should teams always pick the cheapest LLM API?
No. Use cheap models for repeatable low-risk work, then compare quality against stronger models for final answers, complex reasoning, and coding agents.