Why this benchmark exists

Most cache benchmarks optimize hit rate. CacheSafety Bench asks a stricter question: can an old answer safely answer a new request without creating a bad hit that users would notice?

Safe Hit RateReusable answers the user would not notice were cached
Bad Hit RateUnsafe reused answers
Cost Saved / 1K RequestsEstimated savings under a safety constraint
Semantic Trap Failure RateHow often similar-looking prompts still fail reuse

Hosted and local positioning

The local benchmark is open source and endpoint-neutral. NextModel hosted runs are optional for larger replay jobs, judge models, and shareable reports.

OpenAI-compatible endpoint
export OPENAI_API_KEY=...
export OPENAI_BASE_URL=https://api.nextmodel.app/v1

Where to start

Start with the public benchmark page, then move to API keys or billing only when you are ready to run larger hosted evaluations.

Landing page/benchmarks/cache-safety
API keys/dashboard/api-keys
Billing/dashboard/billing