Opus 4.7 is the next generation of Anthropic's Opus family, built for long-running, asynchronous agents. Building on the coding and agentic strengths of Opus 4.6, it delivers stronger performance on...
這份候選名單適合什麼用途?
Vision model APIs are useful for screenshots, receipts, product images, visual support tickets, and multimodal Q&A. The right choice depends on image input support, context size, price, and whether the same model must also produce structured JSON output. NextModel groups vision-capable candidates with price and capability labels so developers can test a small set of models quickly.
來源依據: NextModel capability mapping and OpenRouter input-modality metadata when available.
匹配分
推薦候選 vision models
先從候選名單開始,再以真實提示詞測試,並在接入正式環境路由前比較月度成本。
Claude Sonnet 4.5 is Anthropic’s most advanced Sonnet model to date, optimized for real-world agents and coding workflows. It delivers state-of-the-art performance on coding benchmarks such as SWE-bench Verified, with...
Gemini 2.5 Pro is Google’s state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. It employs “thinking” capabilities, enabling it to reason through responses with enhanced accuracy...
Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...
比較表
按價格、提供者、上下文、能力與來源比較這份候選名單。
當你在縮小正式環境候選名單、建立備援策略或比較模型經濟性時,可使用此視圖。
| Model | Provider | Input | Output | Context | Capabilities | Best for | Latency | Status | Source |
|---|---|---|---|---|---|---|---|---|---|
| Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 | Anthropic | $5 / 1M tokens | $25 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | frontier reasoning, large codebase review | 2300-6800ms | Catalog | OpenRouter if available |
| Anthropic: Claude Sonnet 4.5anthropic/claude-sonnet-4.5 | Anthropic | $3 / 1M tokens | $15 / 1M tokens | 1M | Tool callingJSON modeLong contextReasoning | coding agents, code review | 1600-4800ms | Catalog | OpenRouter if available |
| Google: Gemini 2.5 Progoogle/gemini-2.5-pro | $1.25 / 1M tokens | $10 / 1M tokens | 1M | Tool callingVisionJSON modeLong context | long-context analysis, vision workflows | 1500-5000ms | Catalog | OpenRouter if available | |
| Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash | $0.3 / 1M tokens | $2.50 / 1M tokens | 1M | Tool callingVisionJSON modeLong context | long-document summarization, image Q&A | 900-2800ms | Catalog | OpenRouter if available | |
| OpenAI: GPT-4o-miniopenai/gpt-4o-mini | OpenRouter | $0.15 / 1M tokens | $0.6 / 1M tokens | 128k | Tool callingVisionJSON modeLong context | low-cost chat, image understanding | 800-2400ms | Catalog | OpenRouter if available |
| MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 | Moonshot AI | $0.73 / 1M tokens | $3.49 / 1M tokens | 262.1k | JSON modeLong contextStreamingTool calling | long Chinese documents, contract review | 1400-4400ms | Catalog | OpenRouter if available |
| Meta: Llama 4 Maverickmeta-llama/llama-4-maverick | Meta | $0.15 / 1M tokens | $0.6 / 1M tokens | 1M | JSON modeLong contextStreamingLow cost | open-model workflows, cost-sensitive long context | 950-2800ms | Catalog | OpenRouter if available |
| Mistral: Mistral Small 3.2 24Bmistralai/mistral-small-3.2-24b-instruct | Mistral AI | $0.1 / 1M tokens | $0.3 / 1M tokens | 128k | Tool callingJSON modeStreamingLow cost | translation, classification | 700-2300ms | Catalog | OpenRouter if available |
常見問題
Vision models 常見問題
What should I compare before choosing a vision model API?
Compare input support, JSON output, latency, output-token cost, and the quality of answers on your own image samples.
Can low-cost models handle vision tasks?
Some low-cost models can handle lightweight vision tasks, but document-heavy or high-accuracy workflows should be benchmarked carefully.