Skip to main content

7 posts tagged with "engineering"

View all tags

Agentic Engineering: Build Your Own Software Pokémon Army

· 13 min read
Tian Pan
Software Engineer

How one person replaced a 15-person engineering team with autonomous AI agents — and the spectacular failures along the way.

This material was prepared for CIVE 7397 Guest Lecture at the University of Houston.

I didn't study CS in college. I was a management major in Beijing. Somehow I ended up at Yale for a CS master's, then at Uber building systems for 90 million users, then at Brex and Airbnb, and eventually started my own company.

I'm telling you this because the rules of who can build software are being rewritten right now — and your background might be more of an advantage than you think.

Act I: The Solo Grind

150 Lines Per Day Is the Ceiling

Every engineer starts the same way. Blank editor. Blinking cursor. A ticket that says "Build a subscription billing system."

A senior engineer — someone with ten years of experience — produces about 100 to 150 lines of production code per day. The rest is meetings, code reviews, debugging, context-switching. That's the ceiling.

The "10x engineer" was the myth we all chased. But even a 10x engineer was still one person. Productivity scaled linearly with headcount. Want to ship faster? Hire more people — each one takes three to six months to onboard.

And the worst part? Knowledge lived in people's heads. Why was that system designed that way? Ask Chen. Oh, Chen left. Good luck.

The Real Bottleneck: Brain Bandwidth

At Uber, the hardest part of any task was never writing the code. It was the research phase — figuring out where and what to change.

When the codebase is massive, the docs are gone, and the previous owner quit, you spend 80% of your time building a mental model of someone else's system. The bottleneck was always people — their availability, their context window, their bus factor. Not compute. Not ideas.

And then something showed up at the workshop door.

Copilot, Cursor, and the Rare Candy Effect

You discover Copilot. Then Cursor. Then Windsurf. Press Tab and entire functions materialize. It's like someone handed you a Rare Candy after years of manual grinding.

The gains are real — we have field studies now:

  • Microsoft & Accenture ran a randomized trial across 4,000 developers: 26% more merged PRs.
  • Cognition's Devin completes file migrations 10x faster than humans.
  • Junior developers saw +35% productivity gains; seniors got +8 to 16%.

But even with these gains, the ceiling is still you. You're faster at cutting wood, but you haven't built a factory. You're still the one reading specs, making decisions, debugging at 2am.

Rare Candy buffs you. It doesn't give you a Pokémon. And the only way to break through the ceiling is to remove yourself from the production line entirely.

Act II: Catching Your First Pokémon

From Typing Code to Writing Specs

This is the moment everything changes — and it's deceptively simple.

You write a spec. Not code — a spec. Acceptance criteria, constraints, edge cases. You hand it to an autonomous agent like Claude Code. You walk away.

The agent reads your codebase, plans its approach, writes code, runs tests, reads the errors, fixes them, loops. You come back to a pull request. You just caught your first Pokémon.

This is fundamentally different from Cursor or Copilot. Those are power tools — they boost your output. An autonomous agent is a separate worker. The critical skill shifts from prompt engineering to context engineering: designing the world your Pokémon operates in.

My Non-Negotiable Workflow

I always start in Plan Mode. The agent analyzes the codebase and proposes an approach. I review the plan, adjust it, then say "execute."

One rule I never break: "You debug it yourself. I only want results." The agent has to curl the API, read the logs, and write tests to prove its own work. If it can't verify itself, the spec isn't good enough.

Why Context Engineering Beats Prompt Engineering

You've caught your first Pokémon. How do you make it good?

Anthropic's own guidance says the quality of an agent depends less on the model itself and more on how its context is structured and managed. The model is the engine. The context — specs, codebase structure, feedback signals — is the skill book. What you teach it determines how well it fights.

Three inputs matter:

  • Specs. Write clear specifications with acceptance criteria before the agent writes a single line of code. A vague spec gets vague code. A precise spec gets working software.
  • Codebase. Structure your repo so the agent can navigate it — clear file naming, clean module boundaries, up-to-date docs. The agent reads your code the same way a new hire would on day one. If a new hire would be lost, your agent will be lost.
  • Feedback signals. Tests, type checkers, linters. Without feedback, your Pokémon will confidently produce garbage and tell you everything's fine. We've all had coworkers like that.

Defects at Scale: Building the Inspection Line

Your Pokémon wrote code. It compiles. You feel great.

Then you run the tests. Half fail. The agent hallucinated an API endpoint that doesn't exist, used a deprecated library, and introduced a subtle race condition.

This is the central challenge: a Pokémon without quality control manufactures defects at scale. The most important thing you build is not the production system — it's the inspection line.

The agent operates in a tight loop: write → test → fail → read error → fix → repeat, until every check passes green. The magic isn't perfect output on the first try — it never does that. The magic is that the feedback loop runs in seconds, not hours.

My inspection line in practice:

  • Backend: the agent curls the actual API and verifies responses.
  • Frontend: Playwright MCP — the agent opens a real browser, navigates the UI, clicks buttons, and verifies rendered output.
  • Every task: the agent writes its own tests as a deliverable.

The teams getting real value from agents aren't the ones with the best models. They're the ones with the tightest inspection lines.

From One Pokémon to a Full Party

One Pokémon handles one bounded task. Real software projects have many moving parts. You need a party — and for a party to work, you need shared tooling and a shared playbook.

MCP (Model Context Protocol) is the item bag. Any Pokémon can reach in and grab any tool, any API, any data source. It gives your agents hands.

CLAUDE.md and custom skills are the trainer's manual. Custom slash commands — /today, /blog, /ci — encode repeatable combo moves. CLAUDE.md is the rulebook every agent reads on startup: same context, same standards, no babysitting required.

As Anthropic advises: find the simplest solution possible, and only increase complexity when needed.

Your party is assembled. Everything is running. It looks beautiful on the whiteboard. Then it breaks.

The Abyss: When Everything Breaks

The Silent Failure That Shipped

The most dangerous failure isn't the loud one — it's the silent one.

I had a coding agent make changes that passed all existing tests, looked correct in review, and shipped. Days later, I discovered it had broken a subtle invariant that no test covered. No error logs. No crash. Just wrong behavior that took days to trace back to the agent's commit.

That's the nightmare scenario: a Pokémon that produces defective work that passes inspection. Your inspection line has blind spots, and the agent will find every single one.

The Research Confirms It

This isn't just my experience. A NeurIPS 2025 study analyzed 1,600 execution traces across seven multi-agent frameworks and found:

  • Failure rates of 41% to 87% across frameworks.
  • 14 distinct failure modes identified.
  • Coordination breakdowns were the #1 category at 36.9% of all failures — agents losing context during handoffs, contradicting each other, going in circles.

Why Adding More Agents Makes It Worse

Your instinct after a wipeout: "I need more agents." That instinct is wrong.

Google DeepMind and MIT tested this rigorously — 180 configurations, 5 architectures, 3 model families:

  • A centralized orchestrator improved performance by 80.9% on parallelizable tasks.
  • But all multi-agent setups degraded performance by 39–70% on sequential work.
  • Gains plateau at 4 agents. Beyond that, you're paying coordination tax with no return.
  • Uncoordinated agents amplify errors 17.2x. Even with a coordinator: 4.4x.

The lesson: don't add Pokémon. Add the right Pokémon.

Act III: Rebuilding Smarter

Four Principles That Survived Every Explosion

The naive optimism is gone. In its place: hard-won knowledge.

The SWE-Bench leaderboard evaluated 80 unique approaches to agentic coding and found no single architecture consistently wins. But four principles held up:

  1. Inspection over production. Your team wiped because unchecked errors cascaded. The fix isn't stronger Pokémon — it's better inspection gates.
  2. Context beats model. Agents didn't fail because models were weak. They failed because they lacked context. Better skill books beat better engines every time.
  3. Start with one. Gains plateau at four agents (per DeepMind/MIT). Start simple. Add agents only when forced to.
  4. Co-learn with AI. Don't just assign tasks — ask agents to audit your codebase, research best practices, and update CLAUDE.md. Every conversation makes the next one better.

A practical note on costs: you don't need a fortune to start. Claude.ai free tier, GitHub Copilot student plan, and Cursor free tier get you surprisingly far. I run my entire operation on multiple $200/mo subscriptions with a CLI-to-API proxy — roughly 1/7 to 1/10 the cost of raw API calls.

What One Person's Gym Actually Looks Like

This is not a metaphor. This is my literal setup today:

  • 10 Claude Code agents running in parallel across 4 Macs and 6 screens.
  • 5 agent writers producing SEO content 24/7 through an automated yarn blog loop.
  • 1 person running a startup that would have needed 10–15 people two years ago.

Here's how a typical day works:

  • Morning: I run /today. An agent reviews my TODO.md, checks what's in progress, and proposes priorities.
  • Workday: I dispatch tasks to 10 coding agents, each with a bounded spec. While they work, I review PRs and make architecture decisions.
  • Background: Five agent writers run continuously — writing, editing, publishing. I review during breaks.
  • Bug fixes: GitHub Copilot handles small, bounded tasks — quick fixes, adding test coverage.
  • Every six months: Roadmap and OKR planning — irreducibly human, but even that I do with Claude, Gemini, and ChatGPT to reach a quorum.

Six Rules for Training the Army

Two years of running this system gave me six rules. All from painful experience:

  1. "You debug it yourself." The agent curls the API, searches logs, writes tests. If it can't self-verify, the spec needs work.
  2. Tokens consumed = efficiency. The only metric: how many agents can I keep busy simultaneously? Idle agents are wasted capacity.
  3. Work without supervision. The best agents don't wait for assignments. Cron jobs. Infinite task loops. See something that needs doing? Do it.
  4. Architecture = freedom to fail. Good architecture contains the blast radius. Agents can experiment but can't break what matters.
  5. Measurable, improvable, composable. If you can't measure a capability, you can't improve it. Everything should be testable and combinable.
  6. Use agents for everything. Not just code — content, video, social media, customer support, calendar. Then: build tools for agents, not just for humans.

What Makes a Gym Leader

The DORA Gap: Individual Gains, Zero Organizational Improvement

Here's the uncomfortable truth. The DORA 2025 Report — Google's annual study of software delivery — found that while 80% of individual developers report AI productivity gains, organizational delivery metrics show no improvement. AI amplifies existing quality. The Pokémon doesn't fix the strategy.

The Pokémon handles commodity work: boilerplate, tests, spec-to-code translation, docs, well-defined bugs. That stuff is getting cheap fast.

The trainer handles the hard stuff: defining what to build and why. Designing testable systems. Writing specs worth translating. Making architecture decisions under uncertainty.

The Four Skills That Won't Get Automated

  • Context engineering — designing the skill books your Pokémon learn from.
  • Evaluation design — building the inspection line. If you can't evaluate output, you can't run a gym.
  • Systems thinking — understanding where defects cascade. Pokémon do local optimization; trainers do global coherence.
  • Product taste — when anyone can build anything, the question becomes what's worth building.

Why Non-CS Backgrounds Have an Edge

People with CS backgrounds tend to be conservative at the edges of what agents can do. They know too much about what should be hard, so they self-censor. "There's no way the agent can handle distributed transactions." They never ask.

People without CS backgrounds use their imagination. They say "what if I just told it to do this?" and discover it works far more often than experts expected. They push boundaries because they don't know where the boundaries are.

That was me. I didn't know what was "supposed" to be hard, so I tried everything. That's how I built a system that people with ten years more experience hadn't attempted.

The Paradigm Shift: Three Pillars

Everything in this post points to something bigger — a fundamental shift in how software gets built.

Using AI as "fancy autocomplete" is like bolting an electric motor onto a steam engine. You get a little more power, but you're stuck with the old architecture. The real revolution is tearing the steam engine out entirely.

Pillar 1: AI-first design. Stop asking "how can AI help my workflow?" Start asking "what obstacles can I remove so AI can do the work?" This mindset separates trainers who get 2x gains from those who get 100x.

Pillar 2: Closed-loop iteration. Remove humans from the execution loop. Let AI iterate autonomously with full environment access. Extending reliable autonomy from minutes to hours is the trillion-dollar question — every improvement unlocks exponential gains in what one person can build.

Pillar 3: Harness engineering. Humans define boundaries. Decouple architecture into minimal components. Use multi-agent cross-validation. You're not writing code — you're designing the harness that keeps the system honest.

Your First Quest

You started as a solo grinder — just you and a blinking cursor. You got Rare Candy and things got faster, but the ceiling was still you. You caught your first Pokémon, learned context engineering, built an inspection line, assembled a party — and watched it wipe spectacularly.

Then you rebuilt. Smarter. With constraints. With hard-won principles.

The Pokémon will keep getting stronger — new models, new protocols, new frameworks every quarter. But the trainer who designs the system, who decides what to build, how to inspect it, and when to ship it — that person doesn't get automated away.

That person can be you.

Tonight: pick one project. Write a one-page spec. Hand it to Claude Code. Review what comes back.

You just caught your first Pokémon.

Agentic Engineering Patterns That Actually Work in Production

· 8 min read
Tian Pan
Software Engineer

The most dangerous misconception about AI coding agents is that they let you relax your engineering discipline. In practice, the opposite is true. Agentic systems amplify whatever you already have: strong foundations produce velocity, weak ones produce chaos at machine speed.

The shift worth paying attention to isn't that agents write code for you. It's that the constraint has changed. Writing code is no longer the expensive part. That changes almost everything about how you structure your process.

The 80% Problem: Why AI Coding Agents Stall and How to Break Through

· 10 min read
Tian Pan
Software Engineer

A team ships 98% more pull requests after adopting AI coding agents. Sounds like a success story — until you notice that review times grew 91% and PR sizes ballooned 154%. The code was arriving faster than anyone could verify it.

This is the 80% problem. AI coding agents are remarkably good at generating plausible-looking code. They stall, or quietly fail, when the remaining 20% requires architectural judgment, edge case awareness, or any feedback loop more sophisticated than "did it compile?" The teams winning with coding agents aren't the ones who prompted most aggressively. They're the ones who built better feedback loops, shorter context windows, and more deliberate workflows.

Agentic Engineering Patterns: The While Loop Is the Easy Part

· 9 min read
Tian Pan
Software Engineer

Ask any team that's shipped a real agentic system what the hard part was. Almost none of them will say "the LLM call." The core loop that every production agent runs is nearly identical, whether it's Claude Code, Cursor, or a homegrown financial automation tool. The interesting engineering — the part that separates a working agent from a runaway cost center — lives entirely outside that loop.

One team started running an agent loop at $127 per week. Four weeks later, the bill hit $47,000. An uncontrolled loop with no token ceiling had compounded every iteration into a financial catastrophe. The model kept running. Nobody told it to stop.

Building Effective AI Agents: Patterns That Actually Work in Production

· 9 min read
Tian Pan
Software Engineer

Most AI agent projects fail not because the models aren't capable enough — but because the engineers building them reach for complexity before they've earned it. After studying dozens of production deployments, a clear pattern emerges: the teams shipping reliable agents start with the simplest possible system and add complexity only when metrics demand it.

This is a guide to the mental models, patterns, and practical techniques that separate robust agentic systems from ones that hallucinate, loop, and fall apart under real workloads.

Showstopper!: A Journey Through a Software Epic

· 20 min read

G. Pascal Zachary's Showstopper! is more than just a book; it is a monument to one of the most ambitious and arduous undertakings in software history: the creation of Windows NT. With a literary, non-fiction style, the book brings to life the intellect, sweat, conflicts, and glory of a group of genius engineers. It pulls us into the heart of a "war" that reshaped the world of computing.

The Code Warrior

The story's curtain rises on a legendary figure, the very soul of the Windows NT project: David Cutler. His upbringing and trials laid a solid foundation for the entire epic. Hailing from a working-class family in Michigan, Cutler was forged by adversity into a man of independent and resolute character. In his youth, he showed flashes of brilliance on the athletic field, displaying extraordinary leadership and a relentless competitive spirit. His teammates said of him that "his only true rival was himself." However, a severe leg injury in college ended his football career, forcing him to channel all his energy into academics, where his talents in mathematics and engineering began to shine.

After graduating, Cutler threw himself into the burgeoning field of computer programming, quickly making a name for himself at Digital Equipment Corporation (DEC). The real-time operating system he developed for the classic PDP-11 minicomputer already hinted at his exceptional skill in system architecture. Soon, he was entrusted with leading the development of DEC's next-generation 32-bit system, VAX/VMS. The immense success of VMS earned him the reputation of being "the world's best operating system programmer." Yet, beneath the fame, Cutler grew frustrated with DEC's increasingly rigid bureaucracy. When the next-generation computer project he poured his heart into, Prism/Mica, was unceremoniously canceled by corporate leadership, the fiercely independent genius resigned in anger.

Cutler's talent had long before caught the eye of another industry titan: Bill Gates. As early as 1983, DEC executive Gordon Bell had introduced Cutler to Gates, planting the seeds for a future collaboration. In 1988, upon hearing that the Prism project had been axed, Gates personally stepped in to recruit Cutler to Microsoft. He gave Cutler a mission: to start a brand-new operating system project codenamed "NT" (for New Technology). Cutler's experience, fighting spirit, and unparalleled expertise in operating systems were the critical assets Microsoft was betting on for its next generation, setting the stage for the dramatic development saga of NT.

The King of Code

Meanwhile, in the heart of the Microsoft empire, another "King of Code"—Bill Gates—was brewing a storm that would change the industry. From his perspective, we get a glimpse of Microsoft's strategic ambitions in the late 1980s and the macro context of the NT project's birth. Unlike Cutler's working-class background, Gates came from a wealthy family and showed exceptional intelligence and a rebellious streak from a young age. As a teenager, he and Paul Allen became obsessed with computer programming, keenly sensing the immense business opportunities in software. Their BASIC interpreter for the Altair 8800 microcomputer was not only Microsoft's founding creation but also the dawn of the personal computer software era.

By the mid-1980s, Microsoft had established its dominance in the PC market with MS-DOS and the initial versions of Windows. But Gates was keenly aware that these 16-bit systems would soon be unable to meet future computing demands. He shrewdly foresaw the necessity of a brand-new operating system "for the 21st century," one that had to possess high reliability, powerful multitasking capabilities, and cross-platform portability to redefine the standards for both enterprise and personal computing.

At the time, Microsoft was collaborating with IBM on the OS/2 system, but the project was progressing slowly and its market reception was lukewarm. OS/2's lack of good compatibility with the vast library of DOS and Windows applications, coupled with a subpar graphical interface, left Gates increasingly disillusioned. Unwilling to publicly break with IBM, he secretly began planning his "Plan B"—the true genesis of NT. Around 1988, Gates decided to forge a new path. Alongside his then-VP of Strategy, Nathan Myhrvold, he established a vision for the new system and ultimately set his sights on Cutler, who was fresh off his frustration with the Prism project at DEC. Under the guise of developing an improved version of OS/2, Gates successfully recruited Cutler, tasking him in reality with creating a completely new, portable operating system.

Gates is portrayed as a strategist with both top-tier technical intuition and extraordinary business foresight. His commitment to investing up to five years and $1.5 billion in the NT project demonstrated his bold bet on the future of technology. His eye for talent and his advocacy for Microsoft's unique engineering culture—a "rule of the smartest" that sought out the world's most brilliant minds to solve the toughest problems—provided the decisive support for NT's launch. It was Gates's vision and Microsoft's formidable resources that provided the stage for Cutler and his team to unleash their talents.

The Tribe

Cutler's arrival sent shockwaves through Microsoft. He did not come alone; he brought with him a loyal "programming tribe," and their arrival triggered intense cultural clashes and severe challenges of team integration. When news of Cutler's move broke, many of his former colleagues from DEC's Seattle lab answered his call. Within a week, seven top-tier DEC programmers had followed him to Microsoft, forming the core of the NT project. This "DEC tribe" was almost exclusively composed of seasoned male engineers, with an average age far higher than the typical Microsoft employee. They were a tight-knit, self-contained unit.

On their very first day, the famous "onboarding turmoil" erupted. Microsoft required new employees to sign a contract with a strict non-compete clause. Cutler's men deemed it deeply unfair—if DEC had such a clause, they never could have made the jump to Microsoft. They collectively refused to sign and staged a walkout for lunch. Upon hearing the news, Cutler personally intervened, using his forceful personality to compel Microsoft's legal department to back down and remove the unreasonable terms. The incident quickly spread across the Microsoft campus, giving everyone a taste of the tribe's uncompromising style.

The "tribe" moniker was fitting. They occupied an entire hallway in Building 2, operating in lockstep and clashing with Microsoft's existing culture. The chasm in age and background led to constant friction between the DEC "renegades" and the younger Microsoft employees. They held themselves in high regard, derisively calling their younger colleagues "Microsoft Weenies," believing they were the bearers of true engineering artistry. In turn, many within Microsoft were wary of this cliquey and arrogant group of newcomers. Although Cutler himself laughed off the tension, he too felt the difficulty of fitting in, once lamenting, "I have no credibility over here."

However, Microsoft's leadership quickly implemented a brilliant "tribe integration strategy." Steve Ballmer, then head of the systems software division, acted as Cutler's "mentor." Bill Gates personally transferred a veteran Microsoft programmer, Steve Wood, into the NT team to serve as a bridge between the old and new cultures. Meanwhile, Ballmer cleverly appointed Paul Maritz to oversee OS/2-related matters, avoiding a direct conflict with Cutler while allowing him to provide support from the periphery.

Despite the initial hardships, Cutler and his tribe soon began to lay out the grand blueprint for Windows NT. They established three core objectives: portability, reliability, and flexibility. To achieve portability, the team decided to write the kernel in the C language and design a Hardware Abstraction Layer (HAL) to mask differences between underlying CPUs. To achieve "bulletproof" reliability, they adopted a microkernel architecture, isolating functional modules to prevent a single application crash from bringing down the entire system. For flexibility, NT was designed as a modular system supporting multiple "personalities," using different subsystems to be compatible with OS/2, POSIX, and, in the future, Windows applications. These technical decisions, highly advanced for their time, signaled that the great vessel of Windows NT, after weathering its initial cultural storms, had officially set sail.

Dead End

As the project entered its middle phase, a series of major challenges arose, and the NT team seemed to have driven into a "dead end," facing internal conflicts, technical bottlenecks, and a critical strategic turning point. First, a tense "two-front war" emerged within Microsoft: on one side, Cutler's team was building the entirely new NT kernel from scratch; on the other, the traditional Windows team continued to iterate on Windows 3.x over the existing DOS kernel. The two teams competed fiercely for resources, talent, and the attention of upper management, with political undercurrents running deep.

A central point of contention was backward compatibility. Executives like Ballmer repeatedly stressed that NT had to run existing OS/2, DOS, and Windows programs, or it would never win the market. But Cutler was initially vehemently opposed, stubbornly believing that a new system should shed the baggage of the past. His famous quote, "Compatible with DOS? Compatible with Windows? Nobody's gonna want that," sent a chill through management. This devotion to an ideal architecture briefly put the project in danger of becoming disconnected from market realities.

The technical challenges were equally daunting. NT's innovative microkernel architecture, while offering modularity and high reliability, raised huge performance concerns. The client-server style of subsystem calls inevitably added system overhead. When Bill Gates was first briefed on the design, his sharp technical instincts led him to declare, "This is going to have a huge amount of overhead... I don't think we can do it that way." He knew that if NT was too slow, it would be "crucified" by the market and the media. To convince their boss, Cutler's team argued fiercely, submitting a twelve-page report with data to prove that performance was manageable. Gates reluctantly agreed, but his doubts lingered.

Meanwhile, the scale of the NT project far exceeded expectations, and Cutler's preferred small-team model was no longer sustainable. At Microsoft's insistence, the team eventually expanded to nearly 200 people, forcing Cutler to adapt his management style and accept the reality of large-team collaboration.

What ultimately pulled the NT project out of this "dead end" was a decisive external event: in 1990, the collaboration between Microsoft and IBM on OS/2 completely fell apart. This break marked a major strategic pivot for Microsoft, which decided to place all its bets on its own Windows NT. The NT team's mission was fundamentally altered: its development focus shifted from OS/2 API compatibility to full compatibility with and superiority over Windows. This was because, in that same year, Windows 3.0 had achieved unprecedented commercial success. Microsoft realized that NT's future had to be intertwined with Windows. As Nathan Myhrvold put it, "The customer needs a bridge." And so, the team began the arduous task of "switching tracks," extending the Windows API to 32 bits and rewriting the entire graphics subsystem. Though immensely difficult, "they finally got it to run," successfully achieving compatibility with legacy Windows applications. This critical redirection allowed Windows NT to escape its dead end and find the right path to the future.

The Howling Bear

As the project entered the fast lane, the pressure escalated dramatically. The team's work environment grew tense and fierce, filled with emotional collisions and roars, just as the metaphor of "the howling bear" depicted. At Microsoft, Gates and Ballmer championed the philosophy that "only excellent programmers can be managers," requiring leaders to stay hands-on and not detach from frontline coding. This meant NT's managers had to both orchestrate the big picture and dive deep into code, shouldering a double burden.

In this high-pressure environment, Cutler's explosive temper and exacting standards pushed the team to its limits. He mercilessly berated any work that fell short, and his famous threat—"Your ass is grass, and I'm the lawnmower"—kept every subordinate on edge. Yet, it was this unforgiving rigor that forged the team's powerful discipline and execution. As the project progressed, Cutler himself began to change. He started to offer affirmation and encouragement alongside the pressure, gradually evolving from an autocratic expert into a true technical leader.

Simultaneously, the integration between the NT and Windows camps deepened. Chuck Whitmer and others from the original Windows graphics department joined the rewrite of NT's graphics system. Moshe Dunie was appointed chief test officer, establishing a rigorous quality assurance system. The addition of Robert Muglia as a program manager strengthened the link between the technical team and market needs. Muglia repeatedly stressed that software features had to be pragmatic, focusing resources on the security, networking, and compatibility functions that enterprise customers cared about most.

The team's culture also became richer through this fusion. In the intense, male-dominated development environment, female programmer Therese Stowell initiated a witty "feminist movement" in jest, bringing a touch of levity and reflection to the tense atmosphere. Through a process of friction and adaptation, the NT team coalesced into a mature, combat-ready unit, fully prepared for the final sprint.

Loading...

Patrick McKenzie: Why is Stripe's Engineering Quality So High?

· 2 min read

You need enough chips to play the game — hire a sufficient number of high-caliber talents who care about quality and are smart enough. You must repeatedly emphasize the company's culture of valuing quality, forming formal routines to check large pieces of work and fix what needs fixing.

Tactically, there is a best practice — reduce the difficulty of doing the right thing. The Stripe tech team makes various trade-offs to ensure that any engineer can improve any part of the system. Encourage a sense of ownership.

There are dedicated internal tools to check the level of internationalization, which may seem tedious but is worth the time. It goes back to the company's culture; when an individual contributor says, "I spent some time on i18n last week," they should assume that leadership values this enough to respond, "Of course, you took the time to do this, great job."

"Open a ticket for the relevant team, and someone will handle it" is a good practice, but if you can push this system to resolve tickets faster and better, you can motivate people to open tickets.

The company provides dedicated channels, such as mailing list aliases, to report product quality bugs. There are dedicated teams to triage these tasks or assign them to the appropriate groups for fixing, along with established routines to inform the entire company about the bug fix rate.

Before making significant API changes, both internal and external testing should be conducted. Regularly ask, "Who has a real Stripe account on hand? Can we update to the beta version and try it out?" People need to set aside dedicated time for this and document it thoroughly — imagine having a group of picky customers; while you may not be able to use your product as deeply and broadly as users do, this approach is much better than guessing.

Discovering that "a piece of payment code hasn't been touched in 5 years, and I don't know how it works, and there are no tests" is rare but valuable for the engineering team.

None of the above is high-tech, nor is it a sufficient condition to guarantee quality. Stripe never settles for the current level of quality and does not passively say, "Our standards are high," but rather maintains a proactive approach to continuously improve.