A fine-tuned model is not a file in a registry; it is the closure of a pipeline over a training set. The teams that ship only the weights discover their bus factor the day a base-model migration arrives and the original engineer is gone.
A JSON schema validates shape, not meaning. When an LLM upgrade shifts the distribution of values inside a still-valid schema, downstream consumers detonate while the producer's dashboard stays green.
Prompt caching looks like a configured discount, but KV cache eviction on shared LLM infrastructure turns it into a probabilistic one — the same conversation can cost several times more on a busy day with no code change.
A blue/green deployment orphaned a cron pinned to the old color, the prompt cache went cold, and the bill quietly tripled — anatomy of a silent regression and the four practices that close the seam.
A safety disclaimer added to your system prompt does not stop at the user-facing reply. It rides along into every tool call argument the model produces — and into the downstream systems those calls fire against.
An LLM-as-judge ensemble drawn from one provider family measures family-internal consistency, not judgment quality, and the high agreement number is an artifact of provider selection nobody named.
A confidence router that stopped escalating low-confidence answers, the silent provider tier change that caused it, and how response-shape contracts, population-level alerts, and a fallback written for the wrong failure mode hide together.
An LLM provider quietly raised the default max_tokens value and your p99 output length doubled overnight. The parameter you did not send is the configuration that changed under you — here is how to stop inheriting defaults you do not control.
An MCP server running on a developer laptop with a CI-grade OAuth token is a production attack surface. Here is how DNS rebinding, bad bindings, and shared tokens turn one compromised tab into a deploy-key leak.
A benchmark number is a measurement under a protocol, and the protocol is what your vendor controls. Pin the methodology, or contract on your own eval suite.
Treating an LLM model identifier as a name for the weights instead of a label on a routing decision lets the provider silently swap your tenant from a fine-tune to base while the eval suite stays green and customers notice first.
A model registry's promotion gates only work when reviewers have time, independent evidence, and aligned incentives. Most teams build the first half of that contract and skip the rest, and the registry decays into a paperwork pipeline that approves whatever the producer ships.