4 posts tagged with "procurement"

The Model Card Benchmark Whose Methodology Shifted While Your Contract Cited the Number

June 3, 2026 · 11 min read

Software Engineer

Your procurement team renewed the inference contract last quarter and noted, with quiet satisfaction, that the quality clause referencing "HumanEval pass@1 of 84%" had been comfortably exceeded by the provider's latest model card, which now reports 87%. Three points to the good. The clause is satisfied. The relationship is healthy. Meanwhile, your inference team's own regression suite — the one that actually exercises the tasks your product depends on — shows a 2% decline on held-out evaluation cases since the model update shipped. Both numbers are real. Only one of them is in the contract.

This is what it looks like when a marketing artifact is load-bearing in a legal document. The benchmark number on the model card is the headline of a measurement; the methodology that produced it is a footnote in an appendix nobody on the contract review chain reads. When the provider changes the methodology — switches from greedy decode to best-of-three sampling, adds a structured-output system message, swaps the prompt template to match the model's new chat tuning — the number moves in a way that has nothing to do with your traffic and everything to do with how the number is computed. Your contract clause cites the number. The counterparty controls the protocol that produces it. You've signed a clause whose meaning the other side can revise without violating it.

The Cost Forecast Tied to a Pricing Tier You No Longer Qualify For

June 2, 2026 · 11 min read

Tian Pan

Software Engineer

The usage curve barely moved. The bill went up 38%.

That is the email the finance lead at a mid-sized fintech opened on the first Monday of the quarter. Three months earlier, the engineering org had renegotiated their LLM inference contract and shaved a sizeable percentage off the negotiated unit price by committing to a volume floor. The finance model rolled the new unit price into the FY forecast. Nobody bookmarked the footnote in the pricing schedule that said the discount would lapse if monthly usage fell below the floor for three consecutive months. The seasonal traffic dip in April-May did exactly that. The provider re-tiered the account back to list price. No notification reached engineering, because the notification went to the procurement inbox that nobody had read since the contract was signed.

The Data Labeler Whose Pricing Model Assumed Humans Wrote the Prompts

June 2, 2026 · 10 min read

Tian Pan

Software Engineer

Your labels-per-dollar dashboard is the most flattering line on the team review, and it is lying to you. The denominator is the per-task rate you negotiated with a labeling vendor in 2023, when a human research lead wrote each labeling prompt by hand, edited it twice, ran it past a teammate, and submitted maybe forty prompts a week. The numerator is the number of completed tasks coming back through the API. Sometime in the last three months, your team quietly stopped writing prompts by hand and started generating them with an LLM that emits a prompt every two seconds at a marginal cost rounding to zero. Your labels-per-dollar metric is going up, and the only person who knows the metric is meaningless is the account manager at the vendor who is watching their margin compress and is about to send a contract amendment your procurement team will read as a price hike.

The mismatch is not a vendor problem. It is a contract that encodes assumptions about your workflow that are no longer true, and the gap between those assumptions and your current behavior is the surplus value one side is silently absorbing until the renewal cycle forces a price-discovery conversation. The side that notices the mismatch first sets the new price.

The 80-Question Wall: What Enterprise AI Security Questionnaires Actually Demand

April 27, 2026 · 11 min read

Tian Pan

Software Engineer

The AI feature your team shipped in March is unsellable to half your pipeline, and the engineering org doesn't know it yet. Somewhere in account-executive Slack, a deal at 80% probability just got kicked from forecast because the prospect's CISO sent over a 92-question security review with an AI addendum. Question 31 asks for your training data provenance documentation. Question 47 asks whether prompts are logged, where, for how long, and who can read them. Question 63 asks whether your inference can be region-pinned to the EU. Question 78 asks for your prompt-injection resistance rate against the OWASP LLM Top 10 corpus, with measured numbers, by model version. The deal team has 72 hours to respond. Nobody on the AI team has written down the answer to any of these.

This is the new wall. Fortune 500 procurement teams now run AI-feature-specific security reviews that didn't exist in 2023, and the answers your engineering org needs aren't hard to produce — they're just nobody's job. The questions are concrete, the frameworks are public, and yet most AI products are quietly unsellable to regulated enterprises because the answers were never written down.

The frustrating part is that none of this is mysterious. The questionnaires are templated. The expected answers are documented. The real failure mode is that AI features were shipped on the assumption that the existing SOC 2 report would carry the same enterprise-deal weight it carried for the last decade — and it doesn't.

About Tian Pan