Back to Blog
Software Development

Code Review When the Machine Writes the Code

For most of my career, code review was the thing teams skipped first when they got busy. It was the quality gate everyone agreed was important and nobody quite had time for, the careful read that became a quick comment that became a rubber stamp.

That era is over, and the machine ended it.

When AI writes most of the code, and inside the labs building it that figure is now north of 80% (I wrote about that in AI Now Writes Its Own Code), review stops being the optional gate and becomes the load-bearing wall. When you are not the author, reading is the job. The reviewer is now the last human who actually understands what is about to ship, which makes it the most important seat in the room rather than the least. Here is how that seat changed.


The Plausibility Trap

The single most important thing to understand about AI-generated code is that it is plausible.

It compiles. It is tidy. The variables are well named, the functions are a reasonable length, it arrives with tests, and it reads like a competent engineer wrote it on a good day. That plausibility is exactly the danger, because it disarms your scrutiny at the precise moment you need it most.

Think about how review actually works on human code. Bad human code usually looks bad. The names are off, the structure is tangled, something smells, and that smell is what makes you slow down and look harder. AI bad code does not smell. It looks fine. So you nod and move on, and the bug rides the plausibility straight into production.

There is a deeper version of this problem. The model had no intent. It did not reason about your problem and arrive at a considered solution; it pattern-matched its way to the most likely-looking code. So the reviewer's oldest and most useful question, "why did you do it this way," comes back empty, because there was no why. You cannot review reasoning that never happened. You have to review behavior instead.


What to Scrutinize Harder Now

The attention you used to spread evenly across a diff now needs to concentrate. Here is where to aim it, and where to stop.

Scrutinize harder Stop sweating
Does it do what was actually asked Style and formatting (the linter has it)
The edge cases it couldn't know How you would have written it
Security (it writes confident, insecure code) Micro-optimizations it already made
The tests (real behavior, or echo of the code?)
Whether it fits the system

Does it do what was actually asked? Not "is this valid code" but "is this the right code." The model solved the problem it inferred from your prompt and the surrounding context, and that is not always the problem you actually have. Correct code that solves the wrong problem is still wrong.

The edge cases it could not know. This is the embedded-knowledge problem I wrote about in Refactor, Rewrite, or Leave It Alone. The model does not know about the two hundred undocumented exceptions your system has accumulated, the customer whose data breaks the happy path, the integration that lies about its own status codes. Review is where that knowledge gets applied, or where its absence gets shipped.

Security. AI writes confident, insecure code. Injection, broken authorization, secrets in the wrong place, the unsafe default chosen because it was the most common pattern in the training data. The model does not feel the danger, so you have to feel it for both of you.

The tests, especially. AI loves tests that pass, and that is not the same as tests that matter. Check whether each test asserts real behavior or just echoes the implementation back to itself. A green suite that proves nothing is worse than no suite at all, because it sells you false confidence at the moment you most want to believe.

Does it fit the system? The model writes locally-correct islands. It cannot see your architecture, your conventions, or the helper function that already does this three files over. Left unreviewed, you get duplication, drift, and five slightly different ways to do the same thing.

Hallucinated or subtly wrong. An API call that does not exist. An off-by-one that looks deliberate. A comment that confidently describes behavior the code does not actually have. These are the ones that survive precisely because they look intentional.


What to Stop Sweating

Concentrating attention means pulling it off the things that no longer deserve it.

Style and formatting. The machine is more consistent than any human will ever be, and your linter handles whatever is left. Bikeshedding AI-generated code over brace placement or import order is not diligence, it is a waste of the one scarce resource in the room. Most of it never mattered anyway, which I argued in Things Developers Swear By That Don't Actually Matter.

How you would have written it. If the code is correct, clear, and fits the system, your preferred phrasing is a matter of taste, not quality. Rewriting it to match your style is ego, not review.

Micro-optimizations the model already made. Don't re-litigate what is already fine.

None of this is an argument for less review. It is an argument for pointing it at what can actually hurt you instead of at what merely offends your preferences.


The Reviewer Is the Keeper of Intent

Here is the shift underneath all of it. When the AI supplies the typing, the human supplies the things the AI structurally cannot: intent, context, judgment, and accountability.

You are the keeper of what this code is supposed to do and why. You are the only one in the loop who can tell the difference between locally-plausible and actually-correct, because you are the only one holding the whole picture in your head. That does not make the reviewer the junior box to check at the end. It makes the reviewer the most senior thing in the process.

A few habits follow from taking that seriously.

Read it like you will be the one debugging it at two in the morning, because you will be. Run it; do not review from the diff alone, because plausibility lives in the diff and the truth lives in the running system. And never approve what you do not understand. "The AI wrote it" is not a defense when it breaks at scale, because your name is on the merge, not the model's.

This is the practical face of a warning I keep coming back to: do not outsource the wisdom along with the typing. The teams that get burned in all of this are not the ones using AI to write code. They are the ones who let "looks good to me" quietly replace "I understand this, and it is right," which is the exact failure mode I traced in The Real Cost of Moving Fast with AI.


Whose Name Is on It

Approving code is a small act of vouching. When you click that button, you are telling everyone downstream that you looked, you understood, and you stand behind what is about to ship. That does not change because a machine held the pen. If anything it matters more now, because the machine cannot be held accountable and you can.

Review was never really about catching typos. It was about someone, a specific person with a name and judgment and something at stake, taking responsibility for what goes out the door. In an age where the code writes itself, that someone is not less necessary. They are the whole point.

So be that someone. Read it like it is yours, because the moment you approve it, it is.

Share on LinkedIn