Behavioral Cloning for System Prompts: Preserving Expert Judgment Before It Walks Out the Door
Your best system prompt was written by someone who no longer works here.
That sentence lands differently depending on where you sit in the organization. If you're an engineer who inherited an undocumented 3,000-token prompt that governs a production AI feature, you've already lived this. You've stared at a clause like "Do not include supplementary data unless context warrants it" and had no idea what "context" means, what triggered this rule, or whether removing it would cause a 5% quality improvement or a catastrophic regression. If you're a team lead, you've watched institutional knowledge walk out the door every time a senior engineer or prompt specialist changes jobs — and that knowledge didn't go into the documentation because nobody knew there was anything to document.
This is the system prompt knowledge problem, and it's worse than most teams realize. The fix borrows an idea from robotics research and applies it to a deeply human engineering challenge: behavioral cloning — capturing what an expert does, and why, before they're no longer there to ask.
