How to Do Code Reviews in the Agentic Era

In 2025 we’ve seen the rise of agentic coding (apparently the term “vibe coding” is obsolete). Between AI assistants and agentic workflows, features are flying off the shelf at a pace we’ve never seen before. It’s not uncommon for companies to brag about how many percentage points of their codebase are now written entirely by AI.

Whether this is a good or a bad thing we’re yet to see (I, for one, think it’s good), but this gain in writing speed isn’t without consequences: reviewing the sheer amount of code produced is draining, and code reviews are quickly becoming the bottleneck. Some teams / open source projects have even adopted the nuclear option of not accepting AI-generated pull requests at all.

While banning AI can give people a breather, I don’t think it’s a good option in the long run. “Resistance is futile”, as my favourite fictional species would say. To survive this new level of productivity we have to stop doing the work that machines can do better. Fight AI with AI. But not only AI, a good set of old-school deterministic tooling can also do wonders: if a linter can catch an issue, I shouldn’t be looking at it. If a formatter can fix it automatically, I really don’t care about it.

My “unpopular” take: I don’t really care if a human or an agent wrote the code. In open source, contributions are zero-trust. Whether the code was written by a seasoned FAANG engineer or a high schooler in Sri Lanka, it shouldn’t really matter. So why should we care if it was written by AI?

In theory, human-generated PRs would be smaller, but after 20-ish years working in this industry I had my fair share of mega-PRs so I can confidently say that dealing with mega and/or sloppy PRs isn’t a new problem.

I evaluate code at face value. Does it work? Is it safe? Does it fix a known issue? Does it align with our roadmap? Does it comply with our standards?

This is why in today’s article I would like to talk a little bit about how I am approaching code reviews, not only when dealing with external contributions, but also when dealing with my own AI-generated code as, in fact, coding with an AI means code reviewing the AI the whole time.

What I actually care about
#

When I review code nowadays, I am becoming increasingly higher level. In a sense, the less code that I write manually, the less I care about the individual aspects of the code. I always said, in every single team I took a leadership role in one capacity or another: code is disposable. This has never been more truthful than today. I repeat: code is disposable. What’s not disposable is the system knowledge you acquired when developing certain code. This knowledge is what usually translates well from one implementation to another or, for example, what sticks when migrating from v1 to v2 of your API.

Writing something the second time is easier because you already went through the growing pains of discovering a bunch of stuff and reducing a lot of the ambiguity. You learned what worked well and what didn’t. What was overengineered and what was underengineered. This is the important bit in software engineering: gathering knowledge, iterating, evolving. And this is the kind of knowledge that will survive the AI age. Code is just an implementation detail.

Based on this philosophy, this is a non-exhaustive list of the things I care about when code reviewing:

Architecture and system design
#

AI models struggle with the big picture and also have the tendency of taking many shortcuts. My review process looks for those signals like hardcoded values and configurations, over-simplification of the problem space (AI often treats coding requests as prototypes or demos) and, paradoxically, over-engineering. AI models also have the annoying trait of assuming production-readiness is equal to complexity. Or in other words, they struggle with balance and pragmatism, stuff that we learn with experience and it’s often hard to translate into words.

Public API and modules
#

The ergonomics of what we’re building matter. The public API needs to “feel right” to the average developer who has to use it. A well-designed interface is intuitive, hard to misuse, and hides the messy internals from the rest of the codebase. I check if the interfaces are robust and correctly scoped, aiming for the smallest possible surface area. If the API is clunky, it doesn’t matter how elegant the underlying code is. Is the code easy to use and well documented? A good hint that the public API is good is if the test quality is good. Bad API design is inherently hard to test.

Algorithms and patterns
#

LLMs often default to the most naive, brute-force way to solve a problem. It’s common for an agent to try running a massive data migration using nested loops and committing every few rows, when a bulk insert strategy is the correct approach. Or at a core level: using a list when a map or dictionary is the right data structure. Verifying that the data structures and algorithms actually fit the problem space prevents massive performance drops. The goal is code that scales, not just code that passes the tests. Premature optimisation is still a risk, though. If a simpler approach is slightly slower but much easier to read and we are dealing with a small, bounded data set, readability usually wins.

Dependencies
#

Every new package brings outside risk, potential security flaws, and maintenance overhead. Keeping the app small reduces our attack surface. Core tools like our GenAI SDKs or major web frameworks get a faster pass, but everything else gets scrutinised. A little copying (or reimplementation) is better than a little dependency. The easier it gets to generate and maintain code, the less I worry about re-use, especially if this means adding a new attack vector to my codebase.

Anti-patterns and quality issues
#

Just to name a few: ignoring or silencing errors, side effects, global state, mutable state, resource leaks, unused functions or variables, and so on. Language idioms also matter. While I do care about these a lot, these are also some of the easiest to automate with the use of static analysis (linters) like golangci-lint (Go) and ruff (Python).

Testability
#

Code that’s hard to test is usually designed poorly and will resist change in the future. Clear separation of concerns, clean inputs, and pure functions are ideal. Good tests prove the code works and provide a safety net for our future modifications. For UI components and complex systems, I accept practical testing strategies over strict unit coverage, but the core logic must be covered.

I stopped trying to add a coverage target for every single project because each case is a case, but I need to know that whatever should be tested is tested. Ideally 100% of the happy path and a good percentage of the sad path, but I won’t try to achieve 100% of all code or anywhere close to that. As long as you have a good observability strategy and good error messages, you are setting yourself up for success as new error modes can be added to the test suite later.

Benchmarking
#

For critical paths, we need actual numbers rather than guesses about performance. Clear benchmarks for any changes affecting high-traffic components are required to stop slow code from reaching production.

Lean logging
#

Logs must be actionable. Unnecessary logs increase our cloud bills and can leak private info. Verbose logging is fine during development, but it must be cleaned up before merging.

What I don’t care about (mostly)
#

I let automated tooling handle the details so I can focus on the hard problems. If a machine can do it, a human shouldn’t be doing it.

Every individual line of code
#

Reviewing every line generated by an LLM is the job of a compiler or static analyser. I focus on the logic and connection points instead.

Formatting
#

Ever since I started doing Go I have never been into a discussion about formatting styles, but I know they still exist in certain spaces. The best thing you can do is set up a standard and let the linter and formatter handle it. Once the standard is known the code agent can also be more compliant to it. If the CI pipeline passes, it is fine.

Minor syntax and code details
#

There are many ways to solve a problem, and forcing specific syntax choices limits developer freedom. I don’t care if it’s a for loop or a list comprehension, as long as the logic is sound.

Debugging
#

I almost never do debug sessions (in the strict sense of actually using a debugger). Debugging for me is a last resort, and it’s a synonym for adding a bunch of “I AM HERE” print statements, which really should have been log lines.

If something isn’t working, I create a new test to simulate the problem. If after reproducing the problem I can’t figure out what’s happening, it means that my observability and logs are insufficient, so I focus on improving those.

Un-exported names
#

When a name is local to a function or has limited scope, I care much less about it than if it is used across many functions or files. I’ll skim read the code and if I catch a glimpse of anything absurd I might stop for a tidy up, but otherwise I’m fine with whatever the model decided to use.

Minor dependencies
#

The ones that aren’t your major framework or client library dependencies. These are less of a concern if they meet our security baseline. Checking for security exploits and problematic licenses is still mandatory, though. If I’m importing something just for one “helper” function I will 100% of the time re-implement that function in my code and get rid of the dependency.

Conclusions
#

This isn’t a one-size-fits-all protocol. There is also a lot that can be said about how you are instrumenting your code base. Code reviews alone won’t catch all potential problems, and this is why I strongly advocate for automation, now with agentic coding more than never.

Modern coding agents have many extension patterns that allow you to constrain the model and get more deterministic outputs: Agent Skills, hooks, MCP tools, policies, rules… Use those tools to put well defined limits to the scope of your agents and your life will become much easier.

A car can only run as fast as its brakes support it. Invest in learning the guardrails for your favourite coding agent, and use your precious time to review what can’t be automated.

Happy coding!

Dani =^.^=

What I actually care about#

Architecture and system design#

Public API and modules#

Algorithms and patterns#

Dependencies#

Anti-patterns and quality issues#

Testability#

Benchmarking#

Lean logging#

What I don’t care about (mostly)#

Every individual line of code#

Formatting#

Minor syntax and code details#

Debugging#

Un-exported names#

Minor dependencies#

Conclusions#

Related