A customer's bug report against weights you rotated last month is the moment your model versioning policy stops being internal MLOps and starts being a customer-visible contract.
Personalized AI features inherit a different physics from the cached web. The latency SLO your team borrowed from CDN-backed surfaces is structurally unmeetable for per-user generated responses — and what to do about it.
Stripping reasoning tokens to cut inference cost looks like a clean optimization until an auditor asks for a rationale you no longer produce. Reasoning traces are dual-use artifacts — engineering cost lines and regulated evidence — and the team that owns the prompt rarely owns the audit.
A coding agent does not push when work is ready — it pushes to find out if work is ready. CI cost stops scaling with commits and starts scaling with plan steps, and the forecast model finance built last year no longer holds.
An endpoint alias is not an artifact. When the auditor asks which checkpoint produced a decision, only per-decision checkpoint pinning gives a defensible answer.
Recency-and-length pruning evicts the constraint a later turn silently depends on, and the user reads a confident wrong answer as a competence regression. Pruning is the dual of retrieval, and the team that tuned it for token count is silently regressing answer quality.
Compaction preserves what your agent said and forgets what your user chose. Treat conversation memory as two streams — semantic and structured — or ship a privacy violation in the second.
A negotiated unit price is not a constant — it is the output of a state machine the vendor runs against your account. When seasonality trips the volume floor, the discount lapses and your forecast quietly goes wrong.
When LLM-generated prompts replace hand-written ones, the per-task labeling rate you signed in 2023 becomes a silent margin transfer until the renewal cycle forces a price-discovery fight.
Agent identity has no quarterly review, no team transfer, and no termination event. The day-one IAM grant becomes the day-ninety inheritance, and the org chart is the real obstacle to fixing it.
The seed parameter on hosted LLM APIs is a best-effort hint, not a contract. Why byte-exact CI assertions flake — and what to assert instead.
Agents inherit your wiring but not your sense of place. When the prompt is the same in staging and prod, the model fills in 'where it is' from training data — and 'production database' is the default. Here is how to ground an agent in its environment.