Blog

Page 9

12 articles

The CI Host Whose CPU Governor Decided Your Agent Benchmark's Outcome
A 22% agent latency regression that turned out to be a cpufreq governor change in a runner image — and why CI benchmark numbers measure the host, not your code.
ai-agentsbenchmarking
Jun 19 min
The Coding Agent CI Bill That Doubled Without a Postmortem
A coding agent does not push when work is ready — it pushes to find out if work is ready. CI cost stops scaling with commits and starts scaling with plan steps, and the forecast model finance built last year no longer holds.
ai-engineeringcoding-agents
Jun 110 min
The Compaction Strategy That Summarized Away the User's Original Question
Auto-summarization preserves the conversational arc but quietly mutates the literal phrasing downstream tools depend on. Here is how that bug surfaces and how to architect around it.
insideragents
Jun 110 min
The Compliance Audit That Asked Which Model Produced Which Output
An endpoint alias is not an artifact. When the auditor asks which checkpoint produced a decision, only per-decision checkpoint pinning gives a defensible answer.
ai-compliancemodel-provenance
Jun 110 min
The Conversation Memory Pruning Heuristic That Erased the Context the Next Question Needed
Recency-and-length pruning evicts the constraint a later turn silently depends on, and the user reads a confident wrong answer as a competence regression. Pruning is the dual of retrieval, and the team that tuned it for token count is silently regressing answer quality.
ai-engineeringagents
Jun 19 min
The Conversation Summarization That Erased the Consent Flag the User Gave You
Compaction preserves what your agent said and forgets what your user chose. Treat conversation memory as two streams — semantic and structured — or ship a privacy violation in the second.
insiderai-engineering
Jun 111 min
The Cost Forecast Tied to a Pricing Tier You No Longer Qualify For
A negotiated unit price is not a constant — it is the output of a state machine the vendor runs against your account. When seasonality trips the volume floor, the discount lapses and your forecast quietly goes wrong.
finopsai-cost
Jun 111 min
The Data Labeler Whose Pricing Model Assumed Humans Wrote the Prompts
When LLM-generated prompts replace hand-written ones, the per-task labeling rate you signed in 2023 becomes a silent margin transfer until the renewal cycle forces a price-discovery fight.
insiderdata-labeling
Jun 110 min
The Dataset License That Retroactively Poisoned Your Fine-Tune
A revoked dataset license can turn a deployed fine-tuned model into a regulated artifact overnight. Provenance, unlearning, and architecture choices made before training decide how recoverable that becomes.
ai-governancefine-tuning
Jun 110 min
The Day-One Permissions Nobody Revoked on Day Ninety
Agent identity has no quarterly review, no team transfer, and no termination event. The day-one IAM grant becomes the day-ninety inheritance, and the org chart is the real obstacle to fixing it.
ai-agentsiam
Jun 110 min
The Deterministic Seed Your Eval Suite Set That Your Provider Quietly Ignored
Your eval pipeline sets seed=42 and reports reproducible numbers. The provider may have dropped it at the gateway, the batch size shifted under load, or the system fingerprint rotated overnight — and your benchmark was always one batch away from a different answer.
llm-evaluationreproducibility
Jun 111 min
The Deterministic Seed Your Provider Treated as a Hint, Not a Contract
The seed parameter on hosted LLM APIs is a best-effort hint, not a contract. Why byte-exact CI assertions flake — and what to assert instead.
llminference
Jun 110 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 9

The CI Host Whose CPU Governor Decided Your Agent Benchmark's Outcome

The Coding Agent CI Bill That Doubled Without a Postmortem

The Compaction Strategy That Summarized Away the User's Original Question

The Compliance Audit That Asked Which Model Produced Which Output

The Conversation Memory Pruning Heuristic That Erased the Context the Next Question Needed

The Conversation Summarization That Erased the Consent Flag the User Gave You

The Cost Forecast Tied to a Pricing Tier You No Longer Qualify For

The Data Labeler Whose Pricing Model Assumed Humans Wrote the Prompts

The Dataset License That Retroactively Poisoned Your Fine-Tune

The Day-One Permissions Nobody Revoked on Day Ninety

The Deterministic Seed Your Eval Suite Set That Your Provider Quietly Ignored

The Deterministic Seed Your Provider Treated as a Hint, Not a Contract

About Tian Pan