Skip to main content

2 posts tagged with "platform"

View all tags

The Eval Harness, Not the Prompt, Is Your Real Provider Lock-In

· 10 min read
Tian Pan
Software Engineer

Every "we'll just swap providers if needed" plan in the deck has a budget line for prompt rewrites. None of them has a line for the eval suite. That is the bug. The prompts are the visible coupling — the part you wrote, the part you can grep for, the part a junior engineer can rewrite in an afternoon. The eval harness is the invisible coupling, and it is the one that will eat a quarter of your roadmap when you actually try to migrate.

The pattern shows up the moment leverage matters. Your contract is up. A competitor releases a model that benchmarks better on your domain. Pricing on output tokens shifts under you. You go to run the candidate model through your eval suite to make the call, and within a day you discover that you cannot trust any score the harness produces, because the harness itself was written against the incumbent. You are not comparing models. You are comparing one model against a measurement instrument that was calibrated to the other one.

Building a Generative AI Platform: Architecture, Trade-offs, and the Components That Actually Matter

· 12 min read
Tian Pan
Software Engineer

Most teams treating their GenAI stack as a model integration project eventually discover they've actually built—or need to build—a platform. The model is the easy part. The hard part is everything around it: routing queries to the right model, retrieving context reliably, filtering unsafe outputs, caching redundant calls, tracing what went wrong in a chain of five LLM calls, and keeping costs from tripling month-over-month as usage scales.

This article is about that platform layer. Not the model weights, not the prompts—the surrounding infrastructure that separates a working proof of concept from something you'd trust to serve a million users.