Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...
この候補リストは何に使う?
Long-context models are useful when prompts include full contracts, knowledge-base exports, support histories, or large code files. The tradeoff is that longer prompts can quickly increase cost, so teams should compare both context window and input price before shipping.
ソース基準: NextModel curated catalog and OpenRouter context metadata when available.
Context
推奨候補 long-context models
まず候補リストから始め、実際のプロンプトで試し、本番ルーティング前に月額コストを比較します。
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
比較表
価格、プロバイダー、コンテキスト、機能、ソースで候補を比較します。
本番候補を絞り込むとき、フォールバック方針を作るとき、モデル経済性を比べるときに使います。
| Model | Provider | Input | Output | Context | Capabilities | Best for | Latency | Status | Source |
|---|---|---|---|---|---|---|---|---|---|
| Google: Gemini 2.5 Progoogle/gemini-2.5-pro | $1.25 / 1M tokens | $10 / 1M tokens | 1M | Tool callingVisionJSON modeLong context | long-context analysis, vision workflows | 1500-5000ms | Catalog | OpenRouter if available | |
| DeepSeek: DeepSeek V4 Flashdeepseek/deepseek-v4-flash | DeepSeek | $0.112 / 1M tokens | $0.224 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | low-cost Chinese tasks, long-context summary | 800-2600ms | Catalog | OpenRouter if available |
| Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash | $0.3 / 1M tokens | $2.50 / 1M tokens | 1M | Tool callingVisionJSON modeLong context | long-document summarization, image Q&A | 900-2800ms | Catalog | OpenRouter if available | |
| Meta: Llama 4 Maverickmeta-llama/llama-4-maverick | Meta | $0.15 / 1M tokens | $0.6 / 1M tokens | 1M | JSON modeLong contextStreamingLow cost | open-model workflows, cost-sensitive long context | 950-2800ms | Catalog | OpenRouter if available |
| Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | frontier reasoning, large codebase review | 2300-6800ms | Catalog | OpenRouter if available |
| Anthropic: Claude Sonnet 4.5anthropic/claude-sonnet-4.5 | Anthropic | $3 / 1M tokens | $15 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | coding agents, code review | 1600-4800ms | Catalog | OpenRouter if available |
| Qwen: Qwen3 Coder Plusqwen/qwen3-coder-plus | Alibaba Cloud / Qwen | $0.65 / 1M tokens | $3.25 / 1M tokens | 1M | Tool callingJSON modeLong contextStreaming | Chinese engineering workflows, code generation | 1200-3900ms | Catalog | OpenRouter if available |
| MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 | Moonshot AI | $0.73 / 1M tokens | $3.49 / 1M tokens | 262.1k | JSON modeLong contextStreamingTool calling | long Chinese documents, contract review | 1400-4400ms | Catalog | OpenRouter if available |
FAQ
Long-context models FAQ
Is a larger context window always better?
No. Larger context helps with big inputs, but cost, latency, retrieval design, and answer quality still matter.