The KV Cache Eviction Your Provider Called Cache Pressure and Your Bill Called a Doubled Prefix Charge
Your application opens a long conversation with a forty-thousand-token system prompt and a full tool inventory. Turn 1 pays the prefix at write rates and the provider's KV cache warms up. Turn 2 arrives ninety seconds later. You assume it's a cache hit. Sometimes it is. Sometimes the same forty thousand tokens land on your invoice again at uncached prices, and nothing in your code changed between turn 1 and turn 2.
The thing that changed was somebody else's traffic. The KV cache is shared infrastructure. Your tenant was colocated on a serving node that, in the ninety seconds between your two turns, took on enough other tenants to evict your prefix from memory. The provider's dashboard will describe this as "cache pressure." Your finance team will describe it as a line item that doubled. Both descriptions are accurate. Neither is in your code.
