Prompt Contract Testing: How Teams Building Different Agents Coordinate Without Breaking Each Other
When two microservices diverge in their API assumptions, your integration tests catch it before production does. When two agents diverge in their prompt assumptions, you find out when a customer gets contradictory answers—or when a cascading failure takes down the entire pipeline. Multi-agent AI systems fail at rates of 41–87% in production. More than a third of those failures aren't model quality problems; they're coordination breakdowns: one agent changed how it formats output, another still expects the old schema, and nobody has a test for that.
The underlying problem is that agents communicate through implicit contracts. A research agent agrees—informally, in someone's mental model—to return results as a JSON object with a sources array. The orchestrating agent depends on that shape. Nobody writes this down. Nobody tests it. Six weeks later the research agent's prompt is refined to return a ranked list instead, and the orchestrator silently drops half its inputs.
