Modell-Shortlist

Beste guenstige LLM-API-Modelle fuer kostensensible Produkte

Vergleichen Sie guenstige LLM-API-Modelle nach Eingabepreis, Ausgabepreis, Kontextlaenge, Faehigkeiten, Quelle und Produktionstauglichkeit.

Modelle ansehen Kosten schatzen

Wofur ist diese Shortlist gedacht?

Die Auswahl einer guenstigen LLM-API sollte mit der Form des Workloads beginnen und nicht nur mit dem niedrigsten Listenpreis. Fuer Klassifikation, Zusammenfassungen, Routing, Support-Entwuerfe und Batch-Transformationen kann ein guenstigeres Modell die monatlichen Kosten senken, ohne die Anwendungsschnittstelle zu veraendern. Fuer finale Antworten, komplexes Reasoning oder Coding-Agents sollten Teams ein guenstiges Modell gegen einen staerkeren Fallback benchmarken. NextModel haelt Preis, Kontext, Faehigkeiten, Quellenangabe und Codebeispiele an einem Ort zusammen, damit Entwickler diesen Trade-off vor dem Deployment treffen koennen.

Quellenbasis: Kuratiertes NextModel-Katalogmaterial, oeffentliche Anbieterpreise und OpenRouter-Metadaten, sofern verfuegbar.

Blended price

Empfohlene Kandidaten guenstige llm-api

Starten Sie mit der Shortlist, testen Sie echte Prompts und vergleichen Sie die monatlichen Kosten vor dem Produktionsrouting.

DeepSeekCatalog

DeepSeek: DeepSeek V4 Flash

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

$0.112 / 1M tokensInput$0.224 / 1M tokensOutput1MContext

Best forlow-cost Chinese tasks, long-context summary, batch code assistance

RoutingConfigured

Tool callingJSON modeLong contextReasoningLow cost

OpenRouter if availableOpenRouter public Models API live metadata; public price comes from the registry pricing rule

View details

Mistral AICatalog

Mistral: Mistral Small 3.2 24B

Mistral-Small-3.2-24B-Instruct-2506 is an updated 24B parameter model from Mistral optimized for instruction following, repetition reduction, and improved function calling. Compared to the 3.1 release, version 3.2 significantly improves accuracy on...

$0.1 / 1M tokensInput$0.3 / 1M tokensOutput128kContext

Best fortranslation, classification, short-form summarization

RoutingConfigured

Tool callingJSON modeStreamingLow costVisionLong context

OpenRouter if availableOpenRouter public Models API live metadata; public price comes from the registry pricing rule

View details

OpenRouterCatalog

OpenAI: GPT-4o-mini

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

$0.15 / 1M tokensInput$0.6 / 1M tokensOutput128kContext

Best forlow-cost chat, image understanding, classification

RoutingConfigured

Tool callingVisionJSON modeLong contextStreamingLow cost

OpenRouter if availableOpenRouter public Models API live metadata; public price comes from the registry pricing rule

View details

MetaCatalog

Meta: Llama 4 Maverick

Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...

$0.15 / 1M tokensInput$0.6 / 1M tokensOutput1MContext

Best foropen-model workflows, cost-sensitive long context, classification

RoutingConfigured

JSON modeLong contextStreamingLow costTool callingVision

OpenRouter if availableOpenRouter public Models API live metadata; public price comes from the registry pricing rule

View details

Vergleichstabelle

Vergleichen Sie die Shortlist nach Preis, Anbieter, Kontext, Fahigkeiten und Quelle.

Nutzen Sie diese Ansicht, wenn Sie eine Produktions-Shortlist eingrenzen, eine Fallback-Strategie bauen oder die Modellokonomie vergleichen.

Model	Provider	Input	Output	Context	Capabilities	Best for	Latency	Status	Source
DeepSeek: DeepSeek V4 Flashdeepseek/deepseek-v4-flash	DeepSeek	$0.112 / 1M tokens	$0.224 / 1M tokens	1M	Tool callingJSON modeLong contextReasoning	low-cost Chinese tasks, long-context summary	800-2600ms	Catalog	OpenRouter if available
Mistral: Mistral Small 3.2 24Bmistralai/mistral-small-3.2-24b-instruct	Mistral AI	$0.1 / 1M tokens	$0.3 / 1M tokens	128k	Tool callingJSON modeStreamingLow cost	translation, classification	700-2300ms	Catalog	OpenRouter if available
OpenAI: GPT-4o-miniopenai/gpt-4o-mini	OpenRouter	$0.15 / 1M tokens	$0.6 / 1M tokens	128k	Tool callingVisionJSON modeLong context	low-cost chat, image understanding	800-2400ms	Catalog	OpenRouter if available
Meta: Llama 4 Maverickmeta-llama/llama-4-maverick	Meta	$0.15 / 1M tokens	$0.6 / 1M tokens	1M	JSON modeLong contextStreamingLow cost	open-model workflows, cost-sensitive long context	950-2800ms	Catalog	OpenRouter if available
Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash	Google	$0.3 / 1M tokens	$2.50 / 1M tokens	1M	Tool callingVisionJSON modeLong context	long-document summarization, image Q&A	900-2800ms	Catalog	OpenRouter if available
MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6	Moonshot AI	$0.73 / 1M tokens	$3.49 / 1M tokens	262.1k	JSON modeLong contextStreamingTool calling	long Chinese documents, contract review	1400-4400ms	Catalog	OpenRouter if available

FAQ

Guenstige LLM-API FAQ

Welches Modell ist in diesem Katalog am billigsten?

Die billigste Option haengt von Wechselkurs und Ausgabelaenge ab. Doubao Seed 2.0 Mini ist der kostenguenstigste produktionsreife CNY-Kandidat in diesem Katalog.

Sollten Teams immer die billigste LLM-API waehlen?

Nein. Guenstige Modelle passen fuer wiederholbare Aufgaben mit niedrigem Risiko. Fuer finale Antworten, komplexes Reasoning und Coding-Agents sollten sie gegen staerkere Modelle verglichen werden.

Alle Modelle Preisrechner OpenAI-kompatibler Schnellstart