Blog

Page 3

12 articles

The Context Window Is an API Surface: Treat Your Prompt Structure as a Contract
Why treating your context window layout as a formal API contract — with named slots, versioning, and diff-friendly structure — makes LLM systems dramatically easier to debug and maintain.
insiderprompt-engineering
May 49 min
Conversation-Aware Rate Limiting: Why Per-Request Throttling Breaks Multi-Turn AI
Per-request API throttling treats each conversation turn as an independent call, but a 10-turn debugging session is architecturally one task. Session budgets, semantic deduplication, and graceful degradation are the right primitives — here's why.
rate-limitingllm
May 410 min
The Data Flywheel Assumption: When AI Features Compound and When They Just Accumulate Noise
Most teams believe more interaction data automatically makes their AI better. It doesn't. Here's what separates a real compounding flywheel from an expensive log file.
insiderai-engineering
May 49 min
Data-Sensitivity-Tier Model Routing: Governing Which Model Sees Which Data
Most AI routing decisions optimize for cost and latency. But the privacy classification of your data should drive routing too — and getting this wrong creates silent compliance violations that only surface in audits.
ai-engineeringcompliance
May 411 min
Dead Letters for Agents: What to Do When No Agent Can Complete a Task
Message queues solved the stuck-message problem with dead-letter queues. Agent systems have the same problem but richer failure modes — here's how to adapt the pattern.
insiderai-agents
May 410 min
Diffusion Models in Production: The Engineering Stack Nobody Discusses After the Demo
Running diffusion models at scale exposes hard constraints that demos skip: GPU VRAM ceilings, LoRA hot-swapping architecture, a compliance stack for watermarking and NSFW moderation, and a cost-volume inflection where self-hosting beats every API tier.
insiderdiffusion-models
May 410 min
End-to-End Latency Is Not P99 of Your LLM Call: The Multipliers Nobody Measures in Agentic Systems
Why the P99 latency of your LLM API call tells you almost nothing about what users actually experience in multi-step agent workflows — and the hidden multipliers that bridge the gap.
ai-engineeringllm
May 49 min
The Embedding Fine-Tuning Gap: Generic Vectors Don't Know What Relevant Means in Your Domain
Off-the-shelf embeddings optimize for semantic similarity, not domain relevance. Learn how contrastive fine-tuning with hard negatives, synthetic training data, and proper A/B evaluation closes the gap between benchmark scores and real retrieval quality.
ragembeddings
May 411 min
Epistemic Trust in Agent Chains: How Uncertainty Compounds Through Multi-Step Delegation
When an orchestrator delegates to a subagent and accepts its answer, it inherits that agent's errors. How epistemic trust differs from authorization trust, why confidence compounds dangerously across agent handoffs, and what patterns actually address it.
insiderai-agents
May 410 min
The Eval Debt Ratchet: How Teams Get Buried Cleaning Up What They Shipped on Vibes
Skipping evaluations when shipping AI features creates compounding debt that locks teams into untestable behavior. Here's how the ratchet effect works and how to pay it down without halting feature work.
ai-engineeringllm-evaluation
May 410 min
The Eval Fatigue Cycle: Why AI Quality Measurement Collapses After Launch
Most teams launch with comprehensive AI eval suites and abandon them within six weeks. Here's why the collapse is structurally inevitable — and how to prevent it.
ai-engineeringllm
May 49 min
Feature Interaction Failures in AI Systems: When Two Working Pieces Break Together
AI capabilities that pass every individual test can fail silently in combination. Here's how to audit the seams before users find them.
insiderai-engineering
May 410 min

About Tian Pan

I'm Tian Pan, an engineer-founder focused on agentic engineering — building autonomous AI systems and scaling engineering teams. I write practical guides on system design, technical leadership, and shipping with AI agents. Previously an early engineer at Uber, Brex, and IoTeX.

Page 3

The Context Window Is an API Surface: Treat Your Prompt Structure as a Contract

Conversation-Aware Rate Limiting: Why Per-Request Throttling Breaks Multi-Turn AI

The Data Flywheel Assumption: When AI Features Compound and When They Just Accumulate Noise

Data-Sensitivity-Tier Model Routing: Governing Which Model Sees Which Data

Dead Letters for Agents: What to Do When No Agent Can Complete a Task

Diffusion Models in Production: The Engineering Stack Nobody Discusses After the Demo

End-to-End Latency Is Not P99 of Your LLM Call: The Multipliers Nobody Measures in Agentic Systems

The Embedding Fine-Tuning Gap: Generic Vectors Don't Know What Relevant Means in Your Domain

Epistemic Trust in Agent Chains: How Uncertainty Compounds Through Multi-Step Delegation

The Eval Debt Ratchet: How Teams Get Buried Cleaning Up What They Shipped on Vibes

The Eval Fatigue Cycle: Why AI Quality Measurement Collapses After Launch

Feature Interaction Failures in AI Systems: When Two Working Pieces Break Together

About Tian Pan