Skip to main content

4 posts tagged with "streaming"

View all tags

The Streaming Infrastructure Behind Real-Time Agent UIs

· 12 min read
Tian Pan
Software Engineer

Most agent streaming implementations break in one of four ways: the proxy eats the stream silently, the user closes the tab and the agent runs forever burning tokens, the page refreshes and the task is simply gone, or a tool call fails mid-stream and the agent goes quietly idle. None of these are model problems. They are infrastructure problems that teams discover in production after their demo went fine on localhost.

This post is about that gap — the server-side architecture decisions that determine whether a real-time agent UI is actually reliable, not just impressive in a demo environment.

Voice AI in Production: Engineering the 300ms Latency Budget

· 10 min read
Tian Pan
Software Engineer

Most teams building voice AI discover the latency problem the same way: in production, with real users. The demo feels fine. The prototype sounds impressive. Then someone uses it on an actual phone call and says it feels robotic — not because the voice sounds bad, but because there's a slight pause before every response that makes the whole interaction feel like talking to someone with a bad satellite connection.

That pause is almost always between 600ms and 1.5 seconds. The target is under 300ms. The gap between those two numbers explains everything about how voice AI systems are actually built.

Streaming AI Applications in Production: What Nobody Warns You About

· 10 min read
Tian Pan
Software Engineer

The first sign something is wrong: your staging environment streams perfectly, but in production every user sees a blank screen, then the entire response appears at once. You check the LLM provider — fine. You check the backend — fine. The server is streaming tokens. They just never make it to the browser.

The culprit, 90% of the time: NGINX is buffering your response.

This is the most common streaming failure mode, and it's entirely invisible unless you know to look for it. It also captures something broader about production streaming: the problems aren't usually in the LLM integration. They're in all the infrastructure between the model and the user.

Why Your Agent UI Feels Broken (And How to Fix It)

· 11 min read
Tian Pan
Software Engineer

You've shipped a capable agent. The underlying model is strong — it retrieves the right context, calls the right tools, produces coherent outputs. Then you watch a user try it for the first time and the session falls apart. They don't know when the agent is working. They can't tell if it understood them. They interrupt it mid-task because the silence feels like a hang. They give up and call your support line.

The model wasn't the problem. The interface was.

This is the pattern engineers keep rediscovering after building their first agent product: the human-agent interaction layer is its own engineering discipline, and most teams treat it as an afterthought. They spend months on retrieval quality and tool accuracy, then wire up a chat box as the interface, and wonder why the product feels unreliable even when the backend logs show success.