为什么要做这个基准

大多数缓存基准追求命中率,而 CacheSafety Bench 关注更严格的问题:旧答案是否能安全回答新请求,且不会让用户察觉到错误复用。

Safe Hit RateReusable answers the user would not notice were cached
Bad Hit RateUnsafe reused answers
Cost Saved / 1K RequestsEstimated savings under a safety constraint
Semantic Trap Failure RateHow often similar-looking prompts still fail reuse