The Eval Harness, Not the Prompt, Is Your Real Provider Lock-In
Every "we'll just swap providers if needed" plan in the deck has a budget line for prompt rewrites. None of them has a line for the eval suite. That is the bug. The prompts are the visible coupling — the part you wrote, the part you can grep for, the part a junior engineer can rewrite in an afternoon. The eval harness is the invisible coupling, and it is the one that will eat a quarter of your roadmap when you actually try to migrate.
The pattern shows up the moment leverage matters. Your contract is up. A competitor releases a model that benchmarks better on your domain. Pricing on output tokens shifts under you. You go to run the candidate model through your eval suite to make the call, and within a day you discover that you cannot trust any score the harness produces, because the harness itself was written against the incumbent. You are not comparing models. You are comparing one model against a measurement instrument that was calibrated to the other one.
