Skip to main content

Inside the Loop

Evan Phoenix

A while back, I wrote about Agile in the age of AI. That was the philosophy. People have asked me what that looks like day-to-day, so this is the longer answer.

For most of my career, “how we work” lived in muscle memory. You sit next to people for long enough and you absorb how they think about issues, structure a design doc, construct a good PR, and review other’s work. New folks on the team pick this up by osmosis, and that really worked for a long time. It stopped working once coding agents started doing a meaningful fraction of the typing, because a lot of the habits I’d absorbed suddenly had to be made explicit, both to the agent and to myself.

Crafting a Plan

We write everything down before any code gets touched. For most things, that’s a sentence or a paragraph in Linear, just enough to anchor what the change is and why. For bigger questions, where the team needs to align on what we’re building or how, we write a longer design doc (called RFDs), where we can hash out the problem (is it real, and should we solve it now?) and the design itself. Even one sentence is enough to slow us down and name the real change.

The point of all that writing is to create fuel for the planning process.

The thing I see people miss most often is that you have to have an opinion about what the agent is doing. Not necessarily a strong opinion, but one of your own. Without this, you’ll take whatever the agent gives you, and what the agent gives you tends to be whatever it averaged out of its training data: code that looks like every other agent-written codebase, rather than code that fits the system you’re working in. The opinion is the key part of the work that’s actually yours.

When I sit down with an issue and realize I don’t have an opinion yet — which happens fairly often (especially in code I haven’t touched recently) — I lean on the plan phase to help me develop one. I’ll ask the agent to walk me through what it found in the code, why it chose the abstraction it chose, what alternatives it ruled out and why, and which existing patterns it’s modeling its approach on. The agent is well-suited to this: it’s already read the relevant files, and pulling its reasoning out costs almost nothing. By the end of that back-and-forth, I usually do have an opinion — sometimes the same one the agent landed on, sometimes not. Either way, I now have something to steer with when the code starts arriving.

I Love It When A Plan Comes Together

Reading the plan before starting becomes the most important job, because it’s the cheapest place to catch a misunderstanding. If the agent has framed the problem wrong, picked the wrong abstraction, or missed a constraint, it’ll show up in the plan. At the plan phase, I’ll push back, narrow the scope, point at a file the agent should have read and didn’t, and ask for a revised plan. A redirect at the plan stage is a sentence or two; the same redirect once the agent has written a thousand lines of code is a rewrite. The plan is where you do the steering, and skipping the planning step is how you get the kind of agent output people complain about: sprawling, off-target, hard to review.

Beyond having an opinion, the framing I find most useful is to keep asking which parts of the work the agent is good at and which parts it isn’t, and to lean into the first set while doing the second set myself. The agent is good at: scaffolding, filling in mechanical structure, reading a lot of code without getting bored, and running formatters and tests in a tight loop. It’s much worse at: deciding whether the underlying thesis is right, knowing what the team has argued about offline, holding a long-running goal in mind across a multi-day piece of work, or noticing when a small inconsistency points at a deeper design problem. So I let it draft, and I do the deciding. The reverse split — where I’m drafting and the agent is deciding — doesn’t feel right to me.

Ownership and Responsibility

Most of the code I ship now is drafted by an agent, and so is most of the prose in our design docs. The agent is a good drafter. It’s fast, it can hold a lot of context in mind at once, and it produces something plausible without much prompting.

What it doesn’t do is take responsibility. If a PR lands with my name on it, I’m the one who has to live with what it does six months from now, and I’m the one who’ll have to explain it to a teammate. So I treat the agent roughly the way I’d treat a sharp intern who can type very fast: hand it the brief, read what comes back, push back when the framing is wrong, and never sign off on anything I haven’t actually read.

Remember, the agent will write a plausible-sounding draft for any thesis you hand it, including a wrong one. If you don’t read carefully, you’ll have no way of knowing which one you got. “The agent wrote it” isn’t meaningful at Miren. The authorship, regardless of the typist, is the human in charge of the agent.

Three at a time

As I brought up in the Agile post, I can keep maybe three agents going simultaneously at max. When I hear about people running ten or twenty in parallel, I have to assume either they’re built differently than I am or they’ve got the authorship dial turned way more towards pure agent. Past three, I start losing the thread of which agent is doing what, and the quality of my supervision degrades faster than the agent’s output does. The bottleneck stopped being typing a long time ago. It’s now attention.

Parallel agents are useful for path-finding, but not for solving the maze. If I’m trying to figure out how to shape a tricky piece of work and I have two or three plausible approaches in my head, firing off an agent on each one and seeing which path looks most promising is a great use of them. Each agent is on its own track, and I’m picking between those. But for actually building out one coherent feature, I want a single agent on it, because the pieces have to fit together. Trying to parallelize a coherent build tends to give me three drafts that disagree about the same shared abstraction, and then I’m spending my supervision budget reconciling drafts instead of moving the work forward.

Review, and why both kinds matter

Every PR gets read twice: once by a human, and once by an agent. They catch different things, and over time, I’ve stopped thinking of one as a fallback for the other.

The agent is patient. It will read every line of a thousand-line diff with the same care with which it read the first line — something humans are pretty bad at after PR number four of the day. It catches mechanical things, deviations from our own conventions, security and performance smells, and missing tests. It’s good for breadth.

The human is doing something different. The human is asking whether the change matches the intent. Whether the thing that got built is the thing we agreed to build, whether the trade-offs feel right, whether the change makes sense in the context of where the system is heading. Those are judgement calls, and they depend on context the agent simply doesn’t have. The human is the one who decides whether the PR lands.

One of my colleagues does most of his reviews in a guided session with Claude. He has Claude read the change and surface what’s there (which files were touched, which call sites are affected, which behaviors changed, which spots look risky), and then he asks Claude follow-up questions as he forms his own view of the change. Claude does the legwork; he stays the reviewer. I like that middle mode a lot. It sits somewhere between hands-off automated review and pretending the agent isn’t there.

Keeping PRs small enough to read

The thing I most often have to defend, especially when the agent is producing code quickly, is keeping PRs small. The agent will happily write a thousand lines across twelve files that touch four different concerns if you let it. Writing a sprawling PR is nearly free now. Reviewing one costs the same as it always did.

The property I want from a PR is that the author and the reviewer can hold the same model of the change in their heads at the same time. Once a PR is too big for that, the review stops being a review and becomes a skim. Reviewers find the few things they understand, comment on those, and wave the rest through. We’ve all done it. It’s how human attention works under load.

So we keep PRs scoped to one concern. A bug fix is its own PR, and a refactor in the same area is a separate one. This constraint is key for shaping the agent’s behavior when writing, but also critical for good reviews. What’s changed is that the agent makes the discipline much easier to violate.

It’s not to say that we’re perfect, we absolutely land PRs that are hard to review on occasion. But we really make an attempt to make those the exception, discussed on a case-by-case basis, along with how to prevent it the next time. There’s always a next time, but the hope is that we can elongate the time between.

A common model we use at Miren on PRs is “Accept with follow-up.” We’re a really small team and we don’t want to bog each other down in re-reviewing PRs multiple times unless it’s really necessary. Accept with follow-up lets the review AND the author move forward.

That’s critical for velocity. One of the most important work philosophies I hold is comfortable incrementalism. Attempting to solve huge problems in one swing is how software projects become intractable. Delivering consistent improvement over time, instead, allows the team to comfortably steer the project towards the goal. Sound familiar? It’s agile again.

What I keep coming back to

None of the disciplines above are inventive. If you’d asked me fifteen years ago how my team builds software, I’d have given you roughly the same list: write the issue first, sketch the design before the code, draft carefully, review carefully, ship focused changes, keep the tests honest.

What’s changed is how easy it is to skip any of them. The agent will start coding before there’s an issue, write code before there’s a design, and turn out a giant PR before anyone’s thought about scope. Each one of those shortcuts trades away a sync point, and the sync points are what keep a team running off one shared model of the system, rather than several private ones that drift apart.

The agent is faster than I am, and it’s not going to get slower. When I’m tempted to take one of these shortcuts, what I try to keep in mind is that speed isn’t the goal in itself. The goal is whether what we build together still makes sense to all of us a year from now, and most of the discipline above is in service of that.

The way I’ve come to think about my role in the loop is that the agent brings the enthusiasm to start, and I have to bring the wisdom to stop. The agent will always want to do more: more files, a bit more refactoring, a small cleanup, a quick fix along the way. Some of that is worth doing. But a lot of it is scope creep. When the agent doesn’t know what the issue was originally for, it can’t know the success criteria, and because it doesn’t feel the cost of a vast PR, it will just keep going without that criteria. That part of the work falls to me. Being willing to call something done when the agent would keep going is a piece of the job I didn’t have to be quite so deliberate about before.