Blog

Page 8

12 articles

The Agent's I-Don't-Know Rate That Fell After You Added More Tools
Expanding the tool surface looks like a capability win on the dashboard. Often it is the planner quietly converting honest abstentions into confident wrong answers.
agentsevaluation
Jun 19 min
The AI Feature Your CTO Funded That Your Security Team Will Not Let You Ship
AI features that ship on time treat the security threat model as a shape constraint at the spec stage — not a checklist at the readiness gate. A guide for engineering leaders on moving security upstream.
insiderai-security
Jun 111 min
The Annotation Queue Your Humans Quietly Stopped Reading
Annotator throughput is the silent ceiling on every LLM eval program, and the queue ordering is the sampler nobody designed. How to treat sampling-for-grading as a first-class engineering surface.
insiderevals
Jun 110 min
The Are-You-Sure Confirmation Step Your Users Learned to Click Through
Uniform confirmation prompts in AI agents create habituation: users click through high-stakes actions with the same reflex as low-stakes ones. A stakes-aware friction budget, artifact previews, and instrumented time-to-click rebuild the safety layer.
insiderai-agents
Jun 111 min
The Async Tool Call Your Agent Fired and Forgot
Function calling treats sync and async tools as the same shape. The agent fires a job, receives an ID, marks the step done — and the work never lands.
insiderai-agents
Jun 110 min
The autoscaler that scaled to zero mid-decode: when inference is treated like stateless web traffic
A cost-driven scale-down policy treated a 30-minute long-context job like a stateless HTTP request. The pod got reaped mid-decode, the only signal was a 499 in the gateway logs an hour later, and the postmortem reframed autoscaling as a workload-shape problem.
insiderinference
Jun 112 min
The Budget Cap That Fires After the Action Already Shipped
When the kill-switch fires correctly but the agent has already booked the flight, sent the email, and closed the ticket — why budget caps measured in tokens miss the damage measured in actions, and how to separate spend from irreversibility.
insiderai-agents
Jun 19 min
The Bug Report Against a Model Version You No Longer Serve
A customer's bug report against weights you rotated last month is the moment your model versioning policy stops being internal MLOps and starts being a customer-visible contract.
insiderai-engineering
Jun 111 min
The Cache Stampede That Hit Your Model Provider Instead of Your Database
When a popular prompt prefix expires across a fleet, every worker becomes a cache writer at the same instant — and the bill that used to land on your database lands on your model provider instead.
llmprompt-caching
Jun 110 min
The CDN Edge Cache Your AI Feature Could Not Use Because the Response Varies Per User
Personalized AI features inherit a different physics from the cached web. The latency SLO your team borrowed from CDN-backed surfaces is structurally unmeetable for per-user generated responses — and what to do about it.
ai-engineeringlatency
Jun 19 min
The Chain-of-Thought You Stripped to Save Tokens That Hid an Evidence Requirement
Stripping reasoning tokens to cut inference cost looks like a clean optimization until an auditor asks for a rationale you no longer produce. Reasoning traces are dual-use artifacts — engineering cost lines and regulated evidence — and the team that owns the prompt rarely owns the audit.
llm-opscompliance
Jun 110 min
The Chargeback Model That Made Every Team Rewrite Their Prompts Overnight
Routing token cost back to product teams looks like a finance change, but it ships a prompt rewrite across the company within a sprint and quietly drops output quality in places the cost dashboard cannot see.
finopsllm-cost
Jun 110 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 8

The Agent's I-Don't-Know Rate That Fell After You Added More Tools

The AI Feature Your CTO Funded That Your Security Team Will Not Let You Ship

The Annotation Queue Your Humans Quietly Stopped Reading

The Are-You-Sure Confirmation Step Your Users Learned to Click Through

The Async Tool Call Your Agent Fired and Forgot

The autoscaler that scaled to zero mid-decode: when inference is treated like stateless web traffic

The Budget Cap That Fires After the Action Already Shipped

The Bug Report Against a Model Version You No Longer Serve

The Cache Stampede That Hit Your Model Provider Instead of Your Database

The CDN Edge Cache Your AI Feature Could Not Use Because the Response Varies Per User

The Chain-of-Thought You Stripped to Save Tokens That Hid an Evidence Requirement

The Chargeback Model That Made Every Team Rewrite Their Prompts Overnight

About Tian Pan