Actually you can. If you shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits, then 90% of what people think a review should find goes away. The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build. The remain 10% of things like var naming, whitespace, and patterns can be checked with a linter instead of a person. If you can get the team to that level you can stop doing code reviews.
You also need to build a team that you can trust to write the code you agreed you'd write, but if your reviews are there to check someone has done their job well enough then you have bigger problems.
This falls for the famous "hours of planning can save minutes of coding". Architecture can't (all) be planned out on a whiteboard, it's the response to the difficulty you only realize as you try to implement.
If you can agree what to build and how to build it and then it turns out that actually is a working plan - then you are better than me. That hasn't happened in 20 years of software development. Most of what's planned falls down within the first few hours of implementation.
Iterative architecture meetings will be necessary. But that falls into the pit of weekly meeting.
That's actually one thing that always prevented me from following the standard pathway of "write a design document first, get it approved, then execute" during my years in Google.
I cannot write a realistic non-hand-wavy design document without having a proof of concept working, because even if I try, I will need to convince myself that this part and this part and that part will work, and the only way to do it is to write an actual code, and then you pretty much have code ready, so why bother writing a design doc.
Some of my best (in terms of perf consequences) design documents were either completely trivial from the code complexity point of view, so that I did not actually need to write the code to see the system working, or were written after I already had a quick and dirty implementation working.
That’s why I either started with the ports and adapters pattern or quickly refactored into it on spikes.
You don’t have to choose what flavor of DDD/Clean/… you want to drink, just use some method that keeps domains and use cases separate from implementation.
Just with shapes and domain level tests, the first pass on a spec is easier (at least for me) and I also found feedback was better.
I am sure there are other patterns that do the same, but the trick is to let the problem domain drive, not to choose any particular set of rules.
Keeping the core domain as a fixed point does that for me.
It’s a muscle you can exercise, and doing so helps you learn what to focus on so it’ll be successful. IME a very successful approach is to focus on interfaces, especially at critical boundaries (critical for your use case first, then critical for your existing design/architecture).
Doing this often settles the design direction in a stable way early on. More than that, it often reveals a lot of the harder questions you’ll need to answer: domain constraints and usage expectations.
Putting this kind of work upfront can save an enormous amount of time and energy by precluding implementation work on the wrong things, and ruling out problematic approaches for both the problem at hand as well as a project’s longer term goals.
The problem is that you can only meaningfully pair program with programmers. The people involved in architexture/design meetings might not be programmers. The questions that arise when 2 programmers work might not be resolvable without involving the others.
Nonsense. I pair all the time with stakeholders. If you strip out all of the cucumber nonsense this is essentially what BDD is - fleshing out and refining specs by guiding people through concrete, written example scenarios.
I also often pair with infrastructure people on solving a problem - e.g. "im trying to do x as per the docs, but if you look at my screen i get an 1003 error code any idea what went wrong?".
Or, people on a different team whose microservice talks to mine when debugging an issue or fleshing out an API spec.
It's true that this isnt possible in plenty of organizations due to the culture, but lots of organizations are broken in all sorts of ways that set piles of cash on fire. This one isnt unique.
I also think we're going to see a resurgence of either pair programming, or the buddy system where both engineers take responsibility for the prompting and review and each commit has 2 authors. I actually wrote a post on this subject on my blog yesterday, so I'm happy to see other people saying it too. I've worked on 2-engineer projects recently and it's been way smoother than larger projects. It's just so obvious that asynchronous review cycles are way too slow nowadays, and we're DDoSing our project leaders who have to take responsibility for engineering outcomes.
Maybe it's time to do pair agentic engineering? Have two engineers at the screen, writing the prompts together, and deciding how to verify the results.
You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.
> You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.
I haven't tried pair programming except in very ad-hoc situations, but doing it all the time sounds utterly exhausting. You're taking programming, then layering on top of it a level of constant social interaction over it, and removing the autonomy to just zone out a bit when you need to (to manage stress).
Basically, it sounds like turning programming into an all-day meeting.
So I think it's probably unpopular because most software engineers don't have the personalty to enjoy or even tolerate that environment.
Yeah, I’d have a mental breakdown within weeks if I had to pair more than an hour a day, max (even that much, consistently, would probably harm my quality of life quite a bit—a little every now and then is no big deal, though). No exaggeration, it’d break me in ways that’d take a while to fix.
[edit] I’m not even anti-social, but the feeling of being watched while working is extremely draining. An hour of that is like four hours without it.
Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all. Also, it doesn't mean you get NO solo time. Pairs can decide to break up for a bit and of course sometimes people aren't in leaving your team with an odd number of people, so some _has_ to solo (though sometimes we'd triple!)
But it's something you have to work at which is definitely part of the barrier. Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."
> Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all.
I don't need to try pair programming because I know how that level of constant social interaction makes me feel.
> Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."
No, what you're doing is sort of like if you're raving about the beach, and I say I don't like bright sun, and you insist I need to try the beach to have an opinion on if I like it or not.
I wouldn't call "work" social interaction but I get ya. It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone. It is what it is, though.
> I wouldn't call "work" social interaction but I get ya.
IMHO, social interaction is anything where you interact with other people.
> It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone.
That's very black and white thinking. I like talking to other people, but too much of it is draining. Every day spending all-day or even a half-day working directly with someone else? No thanks.
It's not black and white because that is my whole point: you have to push through the terribleness at the beginning to start feeling the benefits, and most people aren't willing to. I'm a _massive_ introvert myself, btw. But like, I'm not trying to convince you of anything.
I agree. The main reason people give for not liking it is that they say _they_ find it exhausting. _Everyone_ finds it exhausting, at least at first. That mostly stops being the case after a while, though. It can still be tiring but it found it to be a good kind of tiring because we were getting so much done. The team I used to pair on worked incredibly quickly that we started doing 7 hour days and no one noticed (although eventually we came clean).
I find it depressing and dystopian that people are now excited about having a robot pair.
This might be true for tech companies, but the tech department I am in at a large government could absolutely architecture away >95% of 'problems' we are fixing at the end of the SDLC.
I've worked waterfall (defense) and while I hated it at the time I'd rather go back to it. Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks. Also, with so much up front planning the code practically writes itself. I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.
> Most of what's planned falls down within the first few hours of implementation.
Not my experience at all. We know what computers are capable of.
> I've worked waterfall and while I hated it at the time I'd rather go back to it. Today we move much faster but build the wrong thing or rewrite and refactor things multiple times.
My experience as well. Waterfall is like - let's think about where we want this product to go, and the steps to get there. Agile is like ADHD addled zig zag journey to a destination cutting corners because we are rewriting a component for the third time, to get to a much worse product slightly faster. Now we can do that part 10x faster, cool.
The thing is, at every other level of the company, people are actually planning in terms of quarters/years, so the underlying product being given only enough thought for the next 2 weeks at a time is a mismatch.
It’s possible to manage the quarterly expectations by saying “we can improve metric X by 10% in a quarter”. It’s often possible to find an improvement that you’re very confident of making very quickly. Depending on how backwards the company is you may need to hide the fact that the 10% improvement required a one line change after a month of experimentation, or they’ll fight you on the experimentation time and expect that one line to take 5 minutes, after which you should write lots more code that adds no value.
Agile isn’t a good match for a business that can only think in terms of effort and not learning+value. That doesn’t make agile the problem.
My experience in an agile firm was that they hired a lot of experienced people and then treated them like juniors. Actively allergic to thinking ahead.
To get around the problem that deliverables took more than a few days, actual tasks would be salami sliced down into 3 point tickets that simply delivered the starting state the next ticket needed. None of these tickets being completed was an actual user observable deliverable or something you could put on a management facing status report.
Each task was so time boxed, seniors would actively be upbraided in agile ceremonies for doing obvious next steps. 8 tickets sequentially like - Download the data. Analyze the data. Load a sample of the data. Load all the data. Ok now put in data quality tests on the data. OK now schedule the daily load of the data. OK now talk to users about the type of views/aggregations/API they want on the data. OK now do a v0 of that API.
It's sort of interesting because we have fully transitioned from the agile infantilization of seniors to expecting them to replace a team of juniors with LLMs.
I think the bigger issue is that Waterfall is often not "Waterfall".
Sure there's a 3000 row excel file of requirements but during development the client still sees the product or slides outlining how the product works and you still had QA that had to test stuff as you made it. Then you make changes based on that feedback.
While Agile often feels like it's lost the plot. We're just going to make something and iterate it into a product people like versus figuring out a product people will like and designing towards it.
Agile largely came about because we thought about where we wanted the product to go, and the steps to get there, and started building, and then it turned out that the way we thought we wanted to go was wrong, and all of that planning we did was completely wasted.
If you work in an environment where you definitely do know where you want the product to go, and the customer doesn't change their mind once they've seen the first working bits, then great. But I've never worked in that kind of environment.
It helps to at least write down requirements. And not requirements in that "it must use Reddis", but customer, user, performance, cost, etc requirements.
A one page requirements document is like pulling teeth apparently.
There's an abstraction level above which waterfall makes more sense, and below which [some replacement for agile but without the rituals] makes more sense.
I think Qs to ask are.. if the nature of user facing deliverable tasks are longer than a sprint, the tasks have linear dependencies, there are coordination concerns, etc
> > Most of what's planned falls down within the first few hours of implementation.
> Not my experience at all. We know what computers are capable of.
You must not work in a field where uncertainty is baked in, like Data Science. We call them “hypotheses”. As an example, my team recently had a week-long workshop where we committed to bodies of work on timelines and 3 out of our 4 workstreams blew up just a few days after the workshop because our initial hypotheses were false (i.e. “best case scenario X is true and we can simply implement Y; whoops, X is false, onto the next idea”)
Wait, are you perhaps saying that... "it depends"? ;-)
Every single reply in this thread is someone sharing their subjective anecdotal experience..
There are so many factors involved in how work pans out beyond planning. Even a single one of us could probably tell 10 different stories about 10 different projects that all went differently.
> Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks.
That's an interesting observation. That's one of the biggest criticisms of waterfall: by the time you finish building something the requirements have changed already, so you have to rewrite it.
there is a difference between the requirements changing and the poor quality, quickly made implementation proves to be inadequate.
agile approaches are based on the quick implementations, redone as needed.
my favorite life cycle:
1> Start with requirements identification for the entire system.
2> Pick a subset of requirements to implement and demonstrate (or deliver) to the customer.
3> Refine the requirements as needed.
4> go to 2
The key is you have an idea of overall system requirements and what is needed, in the end, for the software you are writing. Thus the re-factoring, and re-design due to things not included in the sprint do not occur. (or occur less)
Comparing the same work done between agile and waterfall I can accept your experience of what sounds like an org with unusually effective long term planning.
However the value of agile is in the learning you do along the way that helps you see that the value is only in 10% of the work. So you’re not comparing 100% across two methodologies, you’re comparing 100% effort vs 10% effort (or maybe 20% because nobody is perfect).
Most of the time when I see unhappiness at the agile result it’s because the assessment is done on how well the plan was delivered, as opposed to how much value was created.
> I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.
That’s not the point. The point is to end up with something actually useful in the end. If the artifact I deliver does not meet requirements, it does not really matter how fast I deliver it.
The reason waterfall methodology falls flat so often is not long delivery times, but ending up with completely the wrong thing.
I think it also depends on how people think. I might be able to sit can't sit in a meeting room/white board/documentation editor and come up with what the big problems is (where pain points in implementation will occur, where a sudden quadratic algorithm pops up, where a cache invalidation becomes impossible, ...) even if I stare at this white board or discuss with my peers for days.
But when I hammer out the first 30 minutes of code, I have that info. And if we just spent four 2-hour meetings discussing this design, it's very common that I after 30 minutes of coding either have found 5 things that makes this design completly infeasible, or maybe 2 things that would have been so good to know before the meeting, that the 8 hours of meetings just should not have happened.
They should have been a single 2 hour meeting, followed by 30 minutes of coding, then a second 2 hour meeting to discuss the discoveries.
Others might be much better than me of discovering these things at the design stage, but to me coding is the design stage. It's when I step back and say "wait a minute, this won't work!".
I've seen engineers I respect abandon this way of working as a team for the productivity promise of conjuring PRs with a coding agent. It blows away years of trust so quickly when you realize they stopped reviewing their own output.
Perhaps due to FOMO outbreak[1], upper management everywhere has demanded AI-powered productivity gains, based on LoC/PR metrics, it looks like they are getting it.
1. The longer I work in this industry, the more it becomes clear that CxO's aren't great at projecting/planning, and default to copy-cat, herd behaviors when uncertain.
Software engineers are pushed to their limits (and beyond). Unrealistic expectations are established by Twitter "I shipped an Uber clone in 2 hours with Claude" forcing every developer to crank out PRs, managers are on the look out for any kind of perceived inefficiency in tools like GetDX and Span.
If devs are expected to ship 10x faster (or else!), then they will find a way to ship 10x faster.
I always found it weird how most management would do almost anything other than ask their dev team "hey, is there any way to make you guys more productive?"
Ive had metrics rammed down my throat, Ive had AI rammed down my throat, Scrum rammed down my throad and Ive had various other diktats rammed down my throat.
95% of which slowed us down.
The only time ive been asked is when there is a deadline and it's pretty clear we arent going to hit it and even then they're interested in quick wins like "can we bring lunch to you for a few weeks?", not systemic changes.
The fastest and most productive times have been when management just set high level goals and stopped prodding.
Im convinced that the companies which seek developer autonomy will leave the ones which seek to maximize token usage in the dust in the next tech race.
Would love to be a fly on the wall for a couple of months to see what corporate CxO's actually do.
Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin. Probably wouldn't be a massively successful one, but good enough to survive 2 years and have millions in the bank for that, or get fired and get a golden parachute.
(half) joking - most likely I'm massively trivializing the role.
Funny enough, the author of this blog post wrote another one on exactly that topic, entitled "What do executives do, anyway?"[1]. If you read it, you'll find it's written from quite an interesting perspective, not quite "fly on the wall," but perhaps as close as you're going to get in a realistic scenario.
"Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin"
Having worked for a pretty decent CIO of a global business I'd say his main job was to travel about speak to other senior leaders and work out what business problems they had and try and work out, at a very high level, how technology would fit into that addressing those problems.
Just parroting latest technology trends would, I suspect, get you sacked within a few weeks.
A charitable explanation for what CxOs do is that they figure out their strategic goals and then focus really hard on ways to herd cats en masse to achieve the goals in an efficient manner. Some people end up doing a great job, some do so accidentally, other just end up doing a job. Sometimes parroting some linkadink drivel is enough to keep the ship on course - usually because the winds are blowing in the right direction or the people at the oars are working well enough on their own.
Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.
>> Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
I have been doing this to, and I've forgotten half of them. For me the point is that this usage scenario is really good, but it also has no added value to it, really. The moment Claude Code raises it prices 2x this won't be viable anymore, and at the same time to scale this to enterprise software production levels you need to spend on an agent probably as much as hiring two SWEs, given that you need at least one to coordinate the agents.
Deepseek v3.2 tokens are $0.26/0.38 on OpenRouter. That model - released 4 months ago - isn't really good enough by today's standards, but its significantly stronger than Opus 4.1, which was only released last August! In 12 months I think its reasonable to expect there will be a model with less cost than that which is significantly stronger than anything available now.
And no, it isn't ONLY because VC capital is being burned to subsidize cost. That is impossible for the dozen smaller providers offering service at that cost on OpenRouter who have to compete with each other for every request and also have to pay compute bills.
Qwen3.5-9B is stronger than GPT-4o and it runs on my laptop. That isn't just benchmarks either. Models are getting smaller, cheaper and better at the same time and this is going to continue.
I think Claude could raise it's prices 100x and people would still use it. It'd just shift to being an enterprise-only option and companies would actually start to measure the value instead of being "Whee, AI is awesome! We're definitely going really fast now!"
100x? You think people would pay $20k per month for Claude Code?
Codex is as good (or very nearly) as Claude code. Open source models continue to improve. The open source harnesses will also continue to improve. Anthropic is good, but it has no moat. No way could they 100x their prices.
I’m so disappointed to see the slip in quality by colleagues I think are better than that. People who used to post great PRs are now posting stuff with random unrelated changes, little structs and helpers all over the place that we already have in common modules etc :’(
At first glance this looks like it might be the halting problem in disguise (instead of the general function of the logic, just ask if they both have logic that halts or doesn't halt). I think we would need to allow for false negatives to even be theoretically possible, so while identical text comparison would be easy enough, anything past that can quickly becomes complicated and you can probably infinitely expand the complexity by handling more and more edge cases (but never every edge case due to the underlying halting problem/undecidability of code).
This is the part that doesn't get talked about enough. Code review was never just about catching bugs it was how teams built shared understanding of the codebase. When someone skips reviewing their own AI-generated PR, they're not just shipping unreviewed code, they're opting out of knowing what's in their own system. The trust problem isn't really about the AI output quality, it's about whether the person submitting it can answer questions about it six months from now.
That's partly the point of the article, except the article acknowledges that this is organizationally hard:
> You get things like the famous Toyota Production System where they eliminated the QA phase entirely.
> [This] approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.
> The basis of [this system] is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.
> I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.
>The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build.
This is like, there's not going to be surprise on the road you'll take if you already set the destination point. Though most of the time, you are just given a vague description of the kind of place you want to reach, not a precise point targeted. And you are not necessarily starting with a map, not even an outdated one. Also geological forces reshape the landscape at least as fast as you are able to move.
>shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits
Master planning has never worked for my side projects unless I am building the exact replica of what I've done in the past. The most important decisions are made while I'm deep in the code base and I have a better understanding of the tradeoffs.
I think that's why startups have such an edge over big companies. They can just build and iterate while the big company gets caught up in month-long review processes.
If we hired two programmers, the goal was to produce twice the LOC per week. Now we are doing far less than our weekly target. Does not meet expectation.
> You also need to build a team that you can trust to write the code you agreed you'd write
I tell every hire new and old “Hey do your thing, we trust you. Btw we have your phone number. Thanks”
Works like a charm. People even go out of their way to write tests for things that are hard to verify manually. And they verify manually what’s hard to write tests for.
The other side of this is building safety nets. Takes ~10min to revert a bad deploy.
If you do that, it expands your test matrix quadratically.
So, it makes sense if you have infinite testing budgets.
Personally, I prefer exhaustively testing the upgrade path, and investing in reducing the time it takes to push out a hot fix. Chicken bits are also good.
I haven’t heard of any real world situations where supporting downgrades of persistent formats led to best of class product stability.
That's the polite version of "we know where you live". Telling someone you have their phone number is a way of saying "we'll call you and expect immediacy if you break something."
Wanna be treated like an adult? Cool. You'll also be held accountable like an adult.
Never received a phone call at 5am on a Sunday because a bug is causing a valued customer to lose $10k/minute, and by the way, the SVP is also on the line? Lucky bastard
I've seen this mentioned a couple times lately, so I want to say I don't believe pair programming can serve in place of code review.
Code review benefits from someone coming in fresh, making assumptions and challenging those by looking at the code and documentation. With pair programming, you both take the same logical paths to the end result and I've seen this lead to missing things.
My sense is that there is a narrow slice of software developers who genuinely do flourish in a pair programming environment. These are people who actually work through their thoughts better with another person in the loop. They get super excited about it and make the common mistake of "if it works for me, it will work for everybody" and shout it from the hilltops.
Then there are the people who program best in a fugue state and the idea of having to constantly break that to transform their thoughts into words and human interaction is anathema.
I say this as someone who just woke up in the wee hours of the morning when nobody else is around so I can get some work done (:
I worked for five years at a shop where a few years in we started pair programming aggressively. One of our most experienced engineers was really into XP and agile work (in the “purer” meaning of the term). He often suggested pairing when thorny problems came up, and eventually it spread. It often took half or more of the available time for programming each day. That was by far the best working environment I’ve been in. The team was excellent and it seems like we all improved in our skills when we started doing it more. We cut down on how long it took to get in while managing to produce better code. It made planning features and adjusting to unforeseen snags in plans so much quicker. I can’t emphasize enough how much of an impact it made on me as a developer or how much I miss it.
The biggest downside to me was that it forces a level of engagement exceeding the most heads down solo work I’ve done. I’d come home and feel mentally exhausted in a way I didn’t usually.
I like pair programming for certain problems: things that are genuinely hard / pushing the boundaries of both participants knowledge and abilities. In those scenarios sometimes two minds can fill in each other's gaps much more efficiently than either can work alone.
I like pair programming. Not everytime or even everyday, but to shadow a junior a few hours a week, or to work with another senior on a complex/new subject? It's fine.
Unless you're covering 100% of edge/corner cases during planning (including roughly how they're handled) then there is still value in code reviews.
You conveniently brushed this under the rug of pair programming but of the handful of companies I've worked at, only one tried it and just as an experiment which in the end failed because no one really wanted to work that way.
I think this "don't review" attitude is dangerous and only acceptable for hobby projects.
Reviews are vital for 80% of the programmers I work with but I happily trust the other 20% to manage risk, know when merging is safe without review, and know how to identify and fix problems quickly. With or without pairing. The flip side is that if the programmer and the reviewer are both in the 80% then the review doesn’t decrease the risk (it may even increase it).
"If you can get the team to that level you can stop doing code reviews."
IMHO / IME (over 20y in dev) reviewing PRs still has value as a sanity check and a guard against (slippery slope) hasty changes that might not have received all of the prior checks you mentioned. A bit of well-justified friction w/ ROI, along the lines of "slow is smooth, smooth is fast".
This seems to be a core of the problem with trying to leave things to autonomous agents .. The response to Amazons agents deleting prod was to implement review stages
actually you don't need reviews if you have a realistic enough simulation test environment that is fully instrumentable by the AI agent. If you can simulate it almost exactly as in production and it works, there's no need to code review.
to move to the hyperspeed timescale you need reliable models of verification in the digital realm, fully accessible by AI.
I'm in a company that does no reviews and I'm medior. The tools we make is not interesting at all, so it's probably the best position I could ask for. I occasionally have time to explore some improvements, tools and side projects (don't tell my boss about that last one)
and it also works for me when working with ai. that produces much better results, too, when I first so a design session really discussing what to build. then a planning session, in which steps to build it ("reviewability" world wonder). and then the instruction to stop when things get gnarly and work with the hooman.
does anyone here have a good system prompt for that self observance "I might be stuck, I'm kinda sorta looping. let's talk with hooman!"?
Anybody has idea on how to avoid childish resistance? Anytime something like this pops up people discuss it into oblivion and teams stay in their old habits
> your reviews are there to check someone has done their job well enough then you have bigger problems
Welcome to working with real people. They go off the rails and ignore everything you’ve agreed to during design because they get lazy or feel schedule pressure and cut corners all the time.
Sideline: I feel like AI obeys the spec better than engineers sometimes sigh.
Well we can't not review things, because the workflow demands we review things. So we hacked the process and for big changes we begin by asking people who will be impacted (no-code review), then we do a pre-review of a rough implementation and finally do a formal review in a fraction of the time.
I never review PRs, I always rubber-stamp them, unless they come from a certified idiot:
1. I don't care because the company at large fails to value quality engineering.
2. 90% of PR comments are arguments about variable names.
3. The other 10% are mistakes that have very limited blast radius.
It's just that, unless my coworker is a complete moron, then most likely whatever they came up with is at least in acceptable state, in which case there's no point delaying the project.
Regarding knowledge share, it's complete fiction. Unless you actually make changes to some code, there's zero chance you'll understand how it works.
Do people really argue about variable names? Most reviews comments I see are fairly trivial, but almost always not very subjective. (Leftover debug log, please add comment here, etc) Maybe it helps that many of our seniors are from a team where we had no auto-formatter or style guide at all for quite a while. I think everyone should experience that a random mix of `){` and `) {` does not really impact you in any way beyond the mild irking of a crooked painting or something. There's a difference between aesthetically bothersome and actually harmful. Not to say that you shouldn't run a formatter, but just for some perspective.
The greatest barrier to understanding is not lack of knowledge but incorrect knowledge. That's why good names matter. And naming things is hard, which is why it makes sense to comment on variable names in a review.
Unless the naming convention were written in the 90s and all variable must follow a precise algorithm to be made of only abbreviation and a maximum length of 15.
Or for some, if it contains the value of a column in the db, it must have the same name as the column.
So yeah, instead of "UsualQuantityOrder", you get "UslQtyOrd" or "I_U_Q_O"... And you must maintain the comments to explain what the field is supposed to contain.
I have seen this mostly on teams which refuse to formalize preferences into a style guide.
I have fixed this by forcing the issue and we get together as a team, set a standard and document it. If we can use tools to enforce it automatically we do that. If not you get a comment with a link to the style guide and told to fix it.
Style is subjective but consistency is not. Having a formal style guide which is automatically enforced helps with onboarding and code review as well.
I regularly review code that is way more complicated that it should.
The last few days I was going back and forth on reviews on a function that had originally cyclomatic complexity of 23. Eventually I got it down to 8, but I had to call him into a pair programming session and show him how the complexity could be reduced.
Someone giving work like that should be either junior enough that there is potential for training them, so your time investment is worth it, or managed out.
Or it didn't really matter that the function was complex if the structure of what's surrounding it was robust and testable; just let it be a refactor or bug ticket later.
I know the aggravation of getting a hairball of code to review, but I often hold my nose. At least find a better reason to send it back, like a specific bug.
If you're sure cyclomatic complexity should be minimized, I think you should put such rules in a pre-commit hook or something that runs before a reviewer ever sees the code. You should only have to help with that if someone can't figure out how to make it pass.
If you're not willing or politically able to implement that, you might be wasting time on your personal taste that the team doesn't agree with. Personally I'm pretty skeptical of cyclomatic complexity's usefulness as a metric.
I just used it here to approximately convey the scale.
the original function was full of mutable state (not required), full of special cases (not required), full of extra return statements (not required). Also had some private helper methods that were mocked in the tests (!!!).
All of this just for a "pure" function. Just immutable object in - immutable object out.
I always approve a change with comments for nits that are optional to address. I only hold back approval if there is a legitimate flaw of some sort. Generally this leads to small changes almost always getting approved on the first shot, but larger changes needing at least one back and forth. AI code review tools make it much easier to spot legitimate problems these days.
> 2. 90% of PR comments are arguments about variable names.
This sort of comment is meaningless noise that people add to PRs to pad their management-facing code review stats. If this is going on in your shop, your senior engineers have failed to set a suitable engineering culture.
If you are one of the seniors, schedule a one-on-one with your manager, and tell them in no uncertain terms that code review stats are off-limits for performance reviews, because it's causing perverse incentives that fuck up the workflow.
The most senior guy has the worst reviews because it takes multiple rounds, each round finds new problems. Manager thinks this contributes to code quality. I was denied promotion because I failed to convince half of the company to drop everything and do my manager's pet project that had literally zero business value.
Yeah, I'm afraid that's an engineering culture that is thoroughly cooked. Not much choice except keep your head down until you are ready to cut your losses
That seems a lot about the company and the culture rather than about how code review is supposed to work.
I have been involved in enough code reviews both in a corporate environment and in open source projects to know this is an outlier. When code review is done well, both the author and reviewer learn from the experience.
People always makes mistakes. Like forgetting to include a change. The point of PRs for me is to try to weed out costly mistakes. Automated tests should hopefully catch most of them though.
The point of PRs is not to avoid mistakes (though sometimes this can happen). Automated tests are the tool to weed out those kinds of mistakes. The point of PRs is to spread knowledge. I try to read every PR, even if it's already approved, so I'm aware of what changes there are in code I'm going to own. They are the RSS feed of the codebase.
I used to do this! I can’t anymore, not with the advent of AI coding agents.
My trust in my colleagues is gone, I have no reason to believe they wrote the code they asked me to put my approval on, and so I certainly don’t want to be on a postmortem being asked why I approved the change.
Perhaps if I worked in a different industry I would feel like you do, but payments is a scary place to cause downtime.
As far as I'm concerned if I approved the PR I'm equally responsible for it as the author is. I never make nitpick comments and I still have to point out meaningful mistakes in around 30% of reviews. The percentage has only risen with AI slop.
These systems make it more efficient to remove the actively toxic members for your team. Beligerence can be passively aggressively “handled” by additional layers but at considerable time and emotional labor cost to people who could be getting more work done without having to coddle untalented assholes.
There's no such thing as a hiring process that avoids that problem 100% of the time.
After all, most people will be on their best behavior during an interview, and even a lengthy interview process is a very short period of time compared to working with someone for weeks or months.
A lot of alignment and pair programming won't be time expensive?
The question is really "Will up-front design and pair programming cost more than not doing up-front design and pair programming?".
In my experience, somewhat counter-intuitively, alignment and pairing is cheaper because you get to the right answer a bit 'slower' but without needing the time spent reworking things. If rework is doubling the time it takes to deliver something (which is not an extreme example, and in some orgs would be incredibly conservative) then spending 1.5 times the estimate putting in good design and pair programming time is still waaaay cheaper.
Yes. This is the way. Declarative design contracts are the answer to A.I. coders. A team declares what they want, agents code it together with human supervision. Then code review is just answering the question "is the code conformant with the design contract?"
But. The design contract needs review, which takes time.
I wonder what delayed continuous release would be like. Trust folks to merge semi-responsibly, but have a two week delay before actually shipping to give yourself some time to find and fix issues.
Perhaps kind of a pain to inject fixes in, have to rebase the outstanding work. But I kind of like this idea of the org having responsibility to do what review it wants, without making every person have to coral all the cats to get all the check marks. Make it the org's challenge instead.
Code reviews are a volunteer’s dilemma. Nobody is showered with accolades by putting “reviewed a bunch of PRs” on their performance review by comparison with “shipped a bunch of features.” The two go hand-in-hand, but rewards follow marks of authorship despite how much reviewers influence what actually landed in production.
Consequently, people tend to become invested in reviewing work only once it’s blocking their work. Usually, that’s work that they need to do in the future that depends on your changes. However, that can also be work they’re doing concurrently that now has a bunch of merge conflicts because your change landed first. The latter reviewers, unfortunately, won’t have an opinion until it’s too late.
Fortunately, code is fairly malleable. These “reviewers” can submit their own changes. If your process has a bias towards merging sooner, you may merge suboptimal changes. However, it will converge on a better solution more quickly than if your changes live in a vacuum for months on a feature branch passing through the gauntlet of a Byzantine review and CI process.
Or the reviewer feels responsible for the output of the code from the person they are reviewing or the code they are modifying. For instance a lead on the team gets credit for the output of the team
Also, wanting to catch bugs on review before they make your on call painful can be a large motivation.
I've always encouraged everyone more junior to review everything regardless of who signs off, and even if you don't understand what's going on/why something was done in a particular way, to not be shy to leave comments asking for clarification. Reviewing others' work is a fantastic way to learn. At a lower level, do it selfishly.
If you're aiming for a higher level, you also need to review work. If you're leading a team or above (or want to be), I assume you'll be doing a lot of reviewing of code, design docs, etc. If you're judged on the effectiveness of the team, reviews are maybe not an explicit part of some ladder doc, but they're going to be part of boosting that effectiveness.
Valve is one of the only companies that appears to understand this, as well as that individual productivity is almost always limited by communication bandwidth, and communication burden is exponential as nodes in the tree/mesh grow linearly. [or some derated exponent since it doesn't need to be fully connected]
The ‘design everything as a publicly accessible API’ directive seems to play to this as well. If all your data / services are available and must be documented then a lot of communication overhead can be eliminated.
I worked in a company where reviews took days. The CTO complained a lot about the speed, but we had decent code quality.
Now I work at a company where reviews take minutes. We have 5 lines of technical debt per 3 lines of code written. We spend months to work on complicated bugs that have made it to production.
My last FAANG team had a soft 4-hour review SLA, but if it was a complicated change then that might just mean someone acknowledging it and committing to reviewing it by a certain date/time. IIRC, if someone requested a review and you hadn't gotten to it by around the 3-hour mark you'd get an automated chat message "so-and-so has been waiting a while for your review".
Everyone was very highly paid, managers measured everything (including code review turnaround), and they frequently fired bottom performers. So, tradeoffs.
Why does it sound horrible to have your code reviewed quickly? There is no reason for reviews to wait a long time. 4 hours is already a long time, it means you can wait to do it right before you go home or after lunch.
Why would I care if my code is reviewed quickly? If the answer is some variant of "I get punished if I don't have enough changes merged in fast enough," that's not helping. From the other side, it's having someone constantly breathe down your neck. Hope you don't get in a flow at the wrong time and need to break it so Mr. Lumbergh doesn't hit you up on Teams. It just reeks of a culture of "unlimited pto," rigid schedules, KPI hacking, and burnout.
You do a lot of small changes (<100 loc) that get reviewed often. If it doesn't get reviewed often then the whole idea of continuous development breaks down.
Argueable you have 8 hours of work a day. How many of them do you need to write 100 loc? After that 100 loc or maybe 200 take a break and review other people's code.
Plus you also have random meetings and stuff so your day already fragments itself so adding a code review in the time before a meeting or after is "free" from a fragmentation standpoint.
IMO code reviews are not pair programming. By the time I've raised an MR, it's already perfect. I've had multiple client calls, talked to my team about design, unit tested it, tested it on a container environment, thought about it.
So it really doesn't matter when the review gets done. I mean, even a week and it's fine.
Sounds kind of amazing to me. 4 hours is a bit ridiculous, but I wish we had some kind of automated system to poke people about reviews so I don't have to. It's doubly bad because a) I have to do it, and b) it makes me look annoying.
My ideal system (for work) would be something like: after 2 days, ask for a review if the reviewer hasn't given it. After a week, warn them the PR will be auto-approved. After 2 weeks, auto-approve it.
I worked somewhere that actually had a great way to deal with this. It only works in small teams though.
We had a "support rota", i.e. one day a week you'd be essentially excused from doing product delivery.
Instead, you were the dev to deal with big triage, any code reviews, questions about the product, etc.
Any spare time was spent looking for bugs in the backlog to further investigate / squash.
Then when you were done with your support day you were back to sprint work.
This meant there was no ambiguity of who to ask for code review, and limited / eliminated siloing of skills since everyone had to be able to review anyone else's work.
That obviously doesn't scale to large teams, but it worked wonders for a small team.
I have, and in each sprint we always had tickets for reviewing the implementation, which could take anywhere from an hour to 2 days.
The code quality was much better than in my current workplace where the reviews are done in minutes, although the software was also orders of magnitude more complex.
Bonus points: reviews are not taken seriously in the legitimate sense, but a facade of seriousness consisting of picky complaints is put forth to reinforce hierarchy and gatekeeping
I’ve worked on teams like you describe and it’s been terrible. My current team’s SDLC is more along the 5-hour line - if someone hasn’t reviewed your code by the end of today, you bring it up in standup and have someone commit to doing it.
One thing that often gets dismissed is the value/effort ratio of reviews.
A review must be useful and the time spent on reviewing, re-editing, and re-reviewing must improve the quality enough to warrant the time spent on it. Even long and strict reviews are worth it if they actually produce near bugless code.
In reality, that's rarely the case. Too often, reviewing gets down into the rabbithole of various minutiae and the time spent to gain the mutual compromise between what the programmer wants to ship and the reviewer can agree to pass is not worth the effort. The time would be better spent on something else if the process doesn't yield substantiable quality. Iterating a review over and over and over to hone it into one interpretation of perfection will only bump the change into the next 10x bracket in the wallclock timeline mentioned in this article.
In the adage of "first make it work, then make it correct, and then make it fast" a review only needs to require that the change reaches the first step or, in other words, to prevent breaking something or the development going into an obviously wrong direction straight from the start. If the change works, maybe with caveats but still works, then all is generally fine enough that the change can be improved in follow-up commits. For this, the review doesn't need to be thorough details: a few comments to point the change into the right direction is often enough. That kind of reviews are very efficient use of time.
Overall, in most cases a review should be a very short part of the development process. Most of the time should be spent programming and not in review churn. A review serves as a quick check-point that things are still going the right way but it shouldn't dictate the exact path that should be used in order to get there.
> The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore
> The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore.
Making entire classes of issues effectively impossible is definitely the ideal outcome. But, this feels much more complicated when you consider that trust doesn't always extend beyond the company's wall and you cannot always ignore that fact because the negative outcomes can be external to the company.
What if I, a trusted engineer, run `npm update` at the wrong time and malware makes its way into production and user data is stolen? A mistake to learn from, for sure, but a post-mortem is too late for those users.
I'm certainly not advocating for relying on human checks everywhere, but reasoning about where you crank the trust knob can get very complicated or costly. Occasionally a trustworthy human reviewer can be part of a very reasonable control.
Nice piece, and rings true. I also think startups and smaller organizations will be able to capture better value out of AI because they simply don't have all those approval layers.
The approval tree grows logarithmically as the size of the company grows. A startup can win initially because they may have zero or one level to get to production. That's part of how they manage to get inside the OODA loop of much bigger companies.
The flip side of that, and why the software world is not a complex network of millions of tiny startups but in fact has quite a few companies where log(organization) >= 2, is that there are a lot of tasks that are just larger than a startup, and the log of the minimum size organization that can do the job becomes 2 or 3 or 4.
There is certainly at least the possibility that AI can really enhance those startups even faster, but it also means that they'll get to the point that they need more layers more quickly, too. Since AI can help much, much more with coding than it can with the other layers (not that it can't help, but at the moment I don't think there's anybody else in the world getting the advantages from AI that programmers are getting), it can also result in the amount of time that startups can stay in the log(organization)=1 range shrink.
(Pardon the sloppy "log(organization)" notation. It should not be taken too literally.)
I think this makes an assumption early on which is that things are serialized, when usually they are not.
If I complete a bugfix every 30 minutes, and submit them all for review, then I really don't care whether the review completes 5 hours later. By that time I have fixed 10 more bugs!
Sure, getting review feedback 5 hours later will force me to context switch back to 10 bugs ago and try to remember what that was about, and that might mean spending a few more minutes than necessary. But that time was going to be spent _anyway_ on that bug, even if the review had happened instantly.
The key to keeping speed up in slow async communication is just working on N things at the same time.
Excellent article. Based on personal experience, if you build cutting edge stuff then you need great engineers and reviewers.
But for anything else, you just need an individual (not a team) who's okay (not great) at multiple things (architecting, coding, communicating, keeping costs down, testing their stuff). Let them build and operate something from start to finish without reviewing. Judge it by how well their produce works.
Overall, this is pretty accurate. Of course, it’s a range at every level, say 5x-15x. Large companies trend toward 15x and startups toward 5x, which is why startups out-execute large companies. Also, they just skip some levels of review because, for instance, the CEO is sitting in a code review meeting. But yea, the average is close.
about a year ago I shared this on /r/AskProgramming:
"...a Pull Request is a delivery. It's like UPS standing at your door with a package. You think, "Nice, the feature, bugfix, etc has arrived! And because it's a delivery, it's also an inspection. A Code Review. Like a freight delivery with a manifest and signoff. So you have to be able to conduct the inspection: to understand what you're receiving and evaluate if it's acceptable as-is. Like signing for a package, once you approve, the code is yours and your team's to keep."
The metaphor has limits. IRL I sign immediately and resolve issues post-hoc with customer service. The UPS guy is not going to stand on my porch while I check if there's actually a bootable MacBook in the box. The vast majority of the time, there's no issue. If that were the same with code, teams could adopt a similar "trust now and defer verification" approach.
The article has a section on Modularity but never defines it. I wrote a post a few weeks ago on modularity and LLMs which does provide a definition. [1].
Not before coding agents nor after coding agents has any PR taken me 5 hours to review. Is the delay here coordination/communication issues, the "Mythical Mammoth" stuff? I could buy that.
The article is referring to the total time including delays. It isn’t saying that PR review literally takes 5 hours of work. It’s saying you have to wait about half a day for someone else to review it.
Which is a thing that depend very much on team culture. In my team it is perhaps 15 min for smaller fixes to get signoff. There is a virtuous feedback loop here - smaller PRs give faster reviews, but also more frequent PRs, which give more frequent times to actually check if there is something new to review.
If I'm deep in coding flow the last thing I'm going to do is immediately jump on to someone else's PR. Half a day to a day sounds about right from when the PR is submitted to actually getting the green light
Similar in my team and I don't feel like there's much context switching. With around 8 engineers there's usually at least one person not in the middle of something who can spare a few minutes.
The PR won’t take 5 hours of work, but it could easily sit that long waiting for another engineer to willing to context switch from their own heads-down work.
Exactly. Even if I hammer the erstwhile reviewer with Teams/Slack messages to get it moved to the top of the queue and finished before the 5 hours are up, then all the other reviews get pushed down. It averages out, and the review market corrects.
Exaxtly. Can you get a lawyer on the phone now or do you wait ~ 5 hours. How about a doctor appt. Or a vet appt. Or a mechanic visit.
Needing full human attention on a co.plex task from a pro who can only look at your thing has a wait time. It is worse when there are only 2 or 3 such people in the world you can ask!
The article specified wall clock time. One day turnaround is pretty typical if its not urgent enough to demand immediate review, lots of people review incoming PRs as a morning activity.
I've had PRs that take me five hours to review. If your one PR is an entire feature that touches the database, the UI, and an API, and I have to do the QA on every part of it because as soon as I give the thumbs up it goes out the door to clients? Then its gonna take a while and I'm probably going to find a few critical issues and then the loop starts again
I use a PR notifier chrome extension, so I have a badge on the toolbar whenever a PR is waiting on me. I get to them in typically <2 minutes during work hours because I tab over to chrome whenever AI is thinking. Sometimes I even get to browse HN if not enough PRs are coming and not too many parallel work sessions.
One pattern I've seen is that a team with a decently complex codebase will have 2-3 senior people who have all of the necessary context and expertise to review PRs in that codebase. They also assign projects to other team members. All other team members submit PRs to them for review. Their review queue builds up easily and average review time tanks.
Not saying this is a good situation, but it's quite easy to run into it.
Managers are expected to say that we should be productive yet they're responsible for the framework which slows down everyone and it's quite clear that they're perfectly fine with this framework. I'm not saying it's good or bad because it's complicated.
A few years ago there was a thread about "How complex systems fail" here on HN[1], and one aspect of it (rule 9) is about how individuals have to balance between security and productivity, and being judged differently depending on the context (especially being judged after-the-fact for the security aspect, while being judged before the accident for the productivity aspect).
The linked page in the thread is short and quite enlightening, but here is the relevant passage:
> Rule 9: Human operators have dual roles: as producers & as defenders against failure.
> The system practitioners operate the system in order to produce its desired product and also work to forestall accidents. This dynamic quality of system operation, the balancing of demands for production against the possibility of incipient failure is unavoidable. Outsiders rarely acknowledge the duality of this role. In non-accident filled times, the production role is emphasized. After accidents, the defense against failure role is emphasized. At either time, the outsider’s view misapprehends the operator’s constant, simultaneous engagement with both roles.
I find to be true for expensive approvals as well.
If I can approve something without review, it’s instant. If it requires only immediate manager, it takes a day. Second level takes at least ten days. Third level trivially takes at least a quarter (at least two if approaching the end of the fiscal year). And the largest proposals I’ve pushed through at large companies, going up through the CEO, take over a year.
That’s because most teams are doing engineering wrong.
The handover to a peer for review is a falsehood. PRs were designed for open source projects to gate keep public contributors.
Teams should be doing trunk-based development, group/mob programming and one piece flow.
Speed is only one measure and AI is pushing this further to an extreme with the volume of change and more code.
The quality aspect is missing here.
Speed without quality is a fallacy and it will haunt us.
Don’t focus on speed alone, and the need to always be busy and picking up the next item - focus on quality and throughput keeping work in progress to a minimum (1). Deliver meaningful reasoned changed as a team, together.
Communication overhead is the #1 schedule killer, in my experience.
Whenever we have to talk/write about our work, it slows things down. Code reviews, design reviews, status updates, etc. all impact progress.
In many cases, they are vital, and can’t be eliminated, but they can be streamlined. People get really hung up on tools and development dogma, but I've found that there’s no substitute for having experienced, trained, invested, technically-competent people involved. The more they already know, the less we have to communicate.
That’s a big reason that I have for preferring small meetings. I think limiting participants to direct technical members, is really important. I also don’t like regularly-scheduled meetings (like standups). Every meeting should be ad hoc, in my opinion.
Of course, I spent a majority of my career, at a Japanese company, where meetings are a currency, so fewer meetings is sort of my Shangri-La.
I’m currently working on a rewrite of an app that I originally worked on, for nearly four years. It’s been out for two years, and has been fairly successful. During that time, we have done a lot of incremental improvements. It’s time for a 2.0 rewrite.
I’ve been working on it for a couple of months, with LLM assistance, and the speed has been astounding. I’m probably halfway through it, already. But I have also been working primarily alone, on the backend and model. The design and requirements are stable and well-established. I know pretty much exactly what needs to be done. Much of my time is spent testing LLM output, and prompting rework. I’m the “review slowdown,” but the results would be disastrous, if I didn’t do it.
It’s a very modular design, with loosely-coupled, well-tested and documented components, allowing me to concentrate on the “sharp end.” I’ve worked this way for decades, and it’s a proven technique.
Once I start working on the GUI, I guarantee that the brakes will start smoking. All because of the need for non-technical stakeholder team involvement. They have to be involved, and their involvement will make a huge difference (like a Graphic UX Designer), but it will still slow things down. I have developed ways to streamline the process, though, like using TestFlight, way earlier than most teams.
Reviewing things is fast and smooth is things are small. If you have all the involved parties stay in the loop, review happens in the real time. Review is only problematic if you split the do and review steps. The same applies to AI coding, you can chose to pair program with it and then it's actually helpful, or you can have it generate 10k lines of code you have no way of reviewing. You just need people understand that switching context is killing productivity. If more things are happening at the same time and your memory is limited, the time spent on load/save makes it slower than just doing one thing at the time and staying in the loop.
Honestly if I'm just following what a single LLM is doing I'm arguably slower than doing it myself so I'd say that approach isn't very useful for me.
I prefer to review plan (this is more to flush out my assumptions about where something fits in the codebase and verify I communicated my intent correctly).
I'll loosely monitor the process if it's a longer one - then I review the artifacts. This way I can be doing 2/3 things in parallel, using other agents or doing meetings/prod investigation/making coffee/etc.
This is very true in marketing and advertising as well. A campaign where the channel manager can just test ads within a general framework will do ten times better than a campaign that has to go through review processes.
The 10x estimate tracks — I've seen it too. The underlying mechanism is queuing theory: each approval step is a single-server queue with high variance inter-arrival times, so average wait explodes non-linearly. AI makes the coding step ~10x faster but doesn't touch the approval queue. The orgs winning right now are the ones treating async review latency as a first-class engineering metric, same way they treat p99 latency for services.
Well, this all makes sense for application code, but not necessarily for infrastructure changes. Imagine a failed Terraform merge that deletes the production database but opens the inbound at 0.0.0.0/0, and you can't undo it for 10 minutes. In my opinion, you need to pay attention to the narrow scope specific to a given project.
I broadly agree with this, it really is all about trust. Just, as a company scales it’s hard to make sure that everybody in the team remains trustworthy – it isn’t just about personality and culture, it’s also about people actually having the skill, motivation, and track record of doing good work efficiently. Maybe AI‘s greatest value will be to allow teams to stay small, which reduces the difficulty of maintaining trust.
It’s also the case that someone you trust makes an honest mistake and, for example, gets their laptop stolen and their credentials compromised. I do trust my team, and want that to be the foundation to our relationship, but I also recognize that humans are infallible and having guardrails (eg code review) is beneficial.
That's exactly why I think vibecoding uniquely benefits solo and small team founders. For anything bigger, work is not the bottleneck, it's someone's lack of imagination.
In my experience a culture where teammates prioritise review times (both by checking on updates in GH a few times a day, and by splitting changes agressively into smaller patches) is reflected in much faster overall progress time. It's definitely a culture thing, there's nothing technically or organisationally difficult about implementing it, it just requires people working together considering team velocity more important than personal velocity.
Let's say a teammate is writing code to do geometric projection of streets and roads onto live video. Another teammate is writing code to do automated drone pursuit of cars. Let's say I'm over here writing auth code, making sure I'm modeling all the branches which might occur in some order.
To what degree do we expect intellectual peerage from someone just glancing into this problem because of a PR? I would expect that to be the proper intellectual peer of someone studying the problem, it's quite reasonable to basically double your efforts.
If the team is that small and working on things that are that disparate, then it is also very vulnerable to one of those people leaving, at which point there's a whole part of the project that nobody on the team has a good understanding of.
Having somebody else devote enough time to being up to speed enough to do code review on an area is also an investment in resilience so the team isn't suddenly in huge difficulty if the lone expert in that area leaves. It's still a problem, but at least you have one other person who's been looking at the code and talking about it with the now-departed expert, instead of nobody.
This is an unusually low overlap per topic; probably needs a different structure to traditional prs to get the best chance to benefit from more eyes... Higher scope planning or something like longer but intermittent partner programming.
Generally if the reviewer is not familiar with the content asynchronous line by line reviews are of limited value.
People are busy, and small bugfixes are usually not that critical. If you make everyone drop everything to review everything, that is much more dysfunctional.
This is a profound point but is review really the problem or is it the handoff that crosses boundaries (me to others, our team to other team, our org to outside our org)?
This reads like a scattered mind with a few good gems, a few assumptions that are incorrect but baked into the author’s world view, and loose coherence tying it all together. I see a lot of myself in it.
I’ll cover one of them: layers of management or bureaucracy does not reduce risk. It creates in-action, which gives the appearance of reducing risk, until some startup comes and gobbles up your lunch. Upper management knows it’s all bullshit and the game theoretic play is to say no to things, because you’re not held accountable if you say no, so they say no and milk the money printer until the company stagnates and dies. Then they repeat at another company (usually with a new title and promotion).
For all the people talking about 5 hour PR review delays... This reminds me of some teams that rotate the "fire extinguisher/emergency bug fixer" duty every day/week/sprint to a different developer. One could rotate a dedicated "first review duty" person. That developer would be in charge of focusing on rapidly starting PR reviews as their priority, with option to request other reviewers if necessary. Spreading the duty around would make people be respectful of the reviewer because if they send unreviewed slop to the reviewers, it's likely that people will send them slop too.
Waiting for a few days of design review is a pain that is easy to avoid: all we need is to be ready to spend a few months building a potentially useless system.
I think the problem is the shape of review processes. People higher up in the corporate food chain are needed to give approval on things. These people also have to manage enormous teams with their own complexities. Getting on their schedule is difficult, and giving you a decision isn't their top priority, slowing down time to market for everything.
So we will need to extract the decision making responsibility from people management and let the Decision maker be exclusively focused on reviewing inputs, approving or rejecting. Under an SLA.
My hypothesis is that the future of work in tech will be a series of these input/output queue reviewers. It's going to be really boring I think. Probably like how it's boring being a factory robot monitor.
> Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself.
That's me. I'm the mad reviewer. Each time I ranted against AI on this site, it was after reviewing sloppy code.
Yes, Claude Opus is better on average than my juniors/new hires. But it will do the same mistakes twice. I _need_ you to fucking review your own generated code and catch the obvious issues before you submit it to me. Please.
In my experience, good mature organisations have clear review processes to ensure quality, improve collaboration and reduce errors and risk. This is regardless of field. It does slow you down - not 10x - but the benefits outweigh the downsides in the long run.
The worst places I’ve worked have a pattern where someone senior drives a major change without any oversight, review or understanding causing multiple ongoing issues. This problem then gets dumped onto more junior colleagues, at which point it becomes harder and more time consuming to fix (“technical debt”). The senior role then boasts about their successful agile delivery to their superiors who don’t have visibility of the issues, much to the eye-rolls of all the people dealing with the constant problems.
I totally agree with his ideas, but somehow he seems just stating the obvious: startups move better than big orgs and you can solve a problem by dividing it in smaller problems - if possible. And that AI experimentation is cheap.
As they say: an hour of planning saves ten hours of doing.
You don't need so much code or maintenance work if you get better requirements upfront. I'd much rather implement things at the last minute knowing what I'm doing than cave in to the usual incompetent middle manager demands of "starting now to show progress". There's your actual problem.
If an hour of planning always saved ten hours of work, software schedules would be a whiteboard exercise.
Instead everyone wants perfect foresight, but systems are full of surprises you only find by building and the cost of pushing uncertainty into docs is that the docs rot because nobody updates them. Most "progress theater" starts as CYA for management but hardens into process once the org is too scared to change anything after the owners move on.
> As they say: an hour of planning saves ten hours of doing.
In software it's the opposite, in my experience.
> You don't need so much code or maintenance work if you get better requirements upfront.
Sure, and if you could wave a magic wand and get rid of all your bugs that would cut down on maintenance work too. But in the real world, with the requirements we get, what do we do?
> In software it's the opposite, in my experience.
That's been my experience as well: ten hours of doing will definitely save you an hour of planning.
If you aren't getting requirements from elsewhere, at least document the set of requirements you think you're working towards, and post them for review. You sometimes get new useful requirements very fast if you post "wrong" ones.
I think what they meant is you “can save 10 hours of planning with one hour of doing”
And I think this has become even more so with the age of ai, because there is even more unknown unknowns, which is harder to discover while planning, but easy wile “doing” and that “doing” itself is so much more streamlined.
In my experience no amount of planning will de-risk software engineering effort, what works is making sure coming back and refactoring, switching tech is less expensive, which allows you to rapidly change the approach when you inevitably discover some roadblock.
You can read all the docs during planning phases, but you will stumble with some undocumented behaviour / bug / limitation every single time and then you are back to the drawing board. The faster you can turn that around the faster you can adjust and go forward.
I really like the famous quote from Churchill- “Plans are useless, planning is essential”
> I think what they meant is you “can save 10 hours of planning with one hour of doing”
I know what they meant, and I also meant the thing I said instead. I have seen many, many people forge ahead on work that could have been saved by a bit more planning. Not overplanning, but doing a reasonable amount of planning.
Figuring out where the line is between planning and "just start trying some experiments" is a matter of experience.
>> Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself. Little of value was gained.
This seems to check out, and it's the reason why I can't reconcile the claims of the industry about workers replacement with reality. I still wonder when a reckoning will come, though. seems long overdue in the current environment
> I still wonder when a reckoning will come, though. seems long overdue in the current environment
Never. Until 1-10 person teams starts disrupt enterprises (legacy banks, payments systems, consultancies).
“Why” would you ask? Because it’s a house of cards. If engineers get redundant, then we don’t need teams. If we don’t need teams, then we don’t need team leads/PMs/POs and others, if we don’t need middle management, then we don’t need VPs and others. All of those layers will eventually catch up to what’s going on and kill any productivity gains via bureaucracy.
I don't agree with this take in the article. One person with Claude Code can replace a team of devs. It resolves many issues, such as the tension between devs wanting to focus and devs wanting their peers to put aside their task to review their pull requests. Claude generates the code and the human reviews it. There's no delay in the back-and-forth unlike in a team of humans. There's no ego and there's no context switching fatigue. Given that code reviewing is a bottleneck, it's feasible that one person can do it by themselves. And Claude can certainly generate working code at least 10x faster than any dev.
You’re talking from idealistic requirements - input - programming - output point. That’s not how the world operates. Egos are “important”, politics, bureaucracy, all of those are essential parts of the organizations. LLMs don’t change that, and without changing that there’s no chance at all. Previously coding was maybe 0.1 bottleneck, now it’s 0.07 bottleneck.
There are examples littered around threads on HN. What happens is when people provide the examples, the goalposts get moved. So people have stopped bothering to reply to these demands.
OpenClaw! You just need to slightly change the definition of “good code”. The point of code is to ultimately bring money. The guy got hired by OpenAI and who gives a shit what happens to the “project” next. Mission accomplished.
I'm guessing a lot of the high-x productivity boost is from a cycle of generating lots of code, having bug reports detected or hallucinated from that code, and then generating even more code to close out those reports, and so on
This is one of the reasons I'm so interested in sandboxing. A great way to reduce the need for review is to have ways of running code that limit the blast radius if the code is bad. Running code in a sandbox can mean that the worst that can happen is a bad output as opposed to a memory leak, security hole or worse.
Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"
> Pre-LLMs correct output was table stakes
We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults. Correctness isn't even on the table, outside of a few (mostly academic) contexts
> Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"
> drained your bank account to buy bitcoin and then deleted your harddrive
These are what I meant by correct output. The software does what you expect it to.
> We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults
This is not really an output issue IMO. This is a failing edge case.
LLMs are moving the industry away from trying to write software that handles all possible edge cases gracefully and towards software developed very quickly that behaves correctly on the happy paths more often than not.
If you save 3 hours building something with agentic engineering and that PR sits in review for the same 30 hours or whatever it would have spent sitting in review if you handwrote it, you’re still saving 3 hours building that thing.
So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput (good lord, we better get used to doing more code review)
This doesn’t work if you spend 3 minutes prompting and 27 minutes cleaning up code that would have taken 30 minutes to write anyway, as the article details, but that’s a different failure case imo
> So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput
Hang on, you think that a queue that drains at a rate of $X/hour can be filled at a rate of 10x$X/hour?
No, it cannot: it doesn't matter how fast you fill a queue if the queue has a constant drain rate, sooner or later you are going to hit the bounds of the queue or the items taken off the queue are too stale to matter.
In this case, filling a queue at a rate of 20 items per hour (every 3 minutes) while it drains at a rate of 1 item every 5 hours means that after a single day, you can expect your last PR to be reviewed in ((8x20) - 1) hours.
IOW, after a single day the time-to-review is 159 hours. Your PRs after the second day is going to take +300 hours.
This is the fundamental issue currently in my situation with AI code generation.
There are some strategies that help: a lot of the AI directives need to go towards making the code actually easy to review. A lot of it it sits around clarity, granularity (code should be committed primarily in reviewable chunks - units of work that make sense for review) rather than whatever you would have done previously when code production was the bottleneck. Similarly, AI use needs to be weighted not just more towards tests, but towards tests that concretely and clearly answer questions that come up in review (what happens on this boundary condition? or if that variable is null? etc). Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions. That is, if a change is evidently risk free (in the sense of, "even if this IS broken it doesn't matter) it should be able to be rapidly approved / merged. Only things where it actually matters if it wrong should be blocked.
I have a feeling there are whole areas of software engineering where best practices are just operating on inertia and need to be reformulated now that the underlying cost dynamics have fundamentally shifted.
>Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions.
Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?
Lemme guess, you cargo culted some "best practices" to offload risk awareness, so now your code is organized in "too big to fail" style and matches your vendor's risk profile instead of yours.
> Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?
I guess the answer (if you're really asking seriously) is that previously when code production cost so far outweighed everything else, it made sense to structure everything to optimise efficiency in that dimension.
So if a change was implemented, the developer would deliver it as a functional unit that might cut across several lines of risk (low risk changes like updating some CSS sitting along side higher risk like a database migration, all bundled together). Because this was what made it fastest for the developer to implement the code.
Now if AI is doing it, screw how easy or fast it is to make the change. Deliver it in review chunks.
Was the original method cargo culted? I think most of what we do is cargo culted regardless. Virtually the entire software industry is built that way. So probably.
If your team's bottleneck is code review by senior engineers, adding more low quality PRs to the review backlog will not improve your productivity. It'll just overwhelm and annoy everyone who's gotta read that stuff.
Generally if your job is acting as an expensive frontend for senior engineers to interact with claude code, well, speaking as a senior engineer I'd rather just use claude code directly.
And when the PR you never even read because the AI wrote it gets bounced back you with an obscure question 13 days later ..... you're not going to be well positioned to respond to that.
But you can’t just not review things!
Actually you can. If you shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits, then 90% of what people think a review should find goes away. The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build. The remain 10% of things like var naming, whitespace, and patterns can be checked with a linter instead of a person. If you can get the team to that level you can stop doing code reviews.
You also need to build a team that you can trust to write the code you agreed you'd write, but if your reviews are there to check someone has done their job well enough then you have bigger problems.
This falls for the famous "hours of planning can save minutes of coding". Architecture can't (all) be planned out on a whiteboard, it's the response to the difficulty you only realize as you try to implement.
If you can agree what to build and how to build it and then it turns out that actually is a working plan - then you are better than me. That hasn't happened in 20 years of software development. Most of what's planned falls down within the first few hours of implementation.
Iterative architecture meetings will be necessary. But that falls into the pit of weekly meeting.
That's actually one thing that always prevented me from following the standard pathway of "write a design document first, get it approved, then execute" during my years in Google.
I cannot write a realistic non-hand-wavy design document without having a proof of concept working, because even if I try, I will need to convince myself that this part and this part and that part will work, and the only way to do it is to write an actual code, and then you pretty much have code ready, so why bother writing a design doc.
Some of my best (in terms of perf consequences) design documents were either completely trivial from the code complexity point of view, so that I did not actually need to write the code to see the system working, or were written after I already had a quick and dirty implementation working.
That’s why I either started with the ports and adapters pattern or quickly refactored into it on spikes.
You don’t have to choose what flavor of DDD/Clean/… you want to drink, just use some method that keeps domains and use cases separate from implementation.
Just with shapes and domain level tests, the first pass on a spec is easier (at least for me) and I also found feedback was better.
I am sure there are other patterns that do the same, but the trick is to let the problem domain drive, not to choose any particular set of rules.
Keeping the core domain as a fixed point does that for me.
It’s a muscle you can exercise, and doing so helps you learn what to focus on so it’ll be successful. IME a very successful approach is to focus on interfaces, especially at critical boundaries (critical for your use case first, then critical for your existing design/architecture).
Doing this often settles the design direction in a stable way early on. More than that, it often reveals a lot of the harder questions you’ll need to answer: domain constraints and usage expectations.
Putting this kind of work upfront can save an enormous amount of time and energy by precluding implementation work on the wrong things, and ruling out problematic approaches for both the problem at hand as well as a project’s longer term goals.
Pair programming 100% of also works. It's unfortunately widely unpopular, but it works.
The problem is that you can only meaningfully pair program with programmers. The people involved in architexture/design meetings might not be programmers. The questions that arise when 2 programmers work might not be resolvable without involving the others.
Nonsense. I pair all the time with stakeholders. If you strip out all of the cucumber nonsense this is essentially what BDD is - fleshing out and refining specs by guiding people through concrete, written example scenarios.
I also often pair with infrastructure people on solving a problem - e.g. "im trying to do x as per the docs, but if you look at my screen i get an 1003 error code any idea what went wrong?".
Or, people on a different team whose microservice talks to mine when debugging an issue or fleshing out an API spec.
It's true that this isnt possible in plenty of organizations due to the culture, but lots of organizations are broken in all sorts of ways that set piles of cash on fire. This one isnt unique.
I also think we're going to see a resurgence of either pair programming, or the buddy system where both engineers take responsibility for the prompting and review and each commit has 2 authors. I actually wrote a post on this subject on my blog yesterday, so I'm happy to see other people saying it too. I've worked on 2-engineer projects recently and it's been way smoother than larger projects. It's just so obvious that asynchronous review cycles are way too slow nowadays, and we're DDoSing our project leaders who have to take responsibility for engineering outcomes.
For anything complicated or wide in scope, we've found it much more productive to just hop on a call and pair.
I’ve started pair programming with Claude and it’s been pretty fun. We make a plan together, I type the code and Claude reviews it. Then we switch.
You’ve made the analogy but I don’t think you’re actually doing an analogous thing. I think you’re just talking about code review.
Maybe it's time to do pair agentic engineering? Have two engineers at the screen, writing the prompts together, and deciding how to verify the results.
You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.
> You are exactly correct. As to why it’s unpopular, I believe it’s just that no one has given it a fair try. Once you have done it for at least 20 hours a week for a few weeks you will understand that typing is not and has never been the bottleneck in programming. If you have not tried it then you cannot have an opinion.
I haven't tried pair programming except in very ad-hoc situations, but doing it all the time sounds utterly exhausting. You're taking programming, then layering on top of it a level of constant social interaction over it, and removing the autonomy to just zone out a bit when you need to (to manage stress).
Basically, it sounds like turning programming into an all-day meeting.
So I think it's probably unpopular because most software engineers don't have the personalty to enjoy or even tolerate that environment.
Yeah, I’d have a mental breakdown within weeks if I had to pair more than an hour a day, max (even that much, consistently, would probably harm my quality of life quite a bit—a little every now and then is no big deal, though). No exaggeration, it’d break me in ways that’d take a while to fix.
[edit] I’m not even anti-social, but the feeling of being watched while working is extremely draining. An hour of that is like four hours without it.
Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all. Also, it doesn't mean you get NO solo time. Pairs can decide to break up for a bit and of course sometimes people aren't in leaving your team with an odd number of people, so some _has_ to solo (though sometimes we'd triple!)
But it's something you have to work at which is definitely part of the barrier. Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."
> Well as the person you are replying to said, it's hard to have an opinion when you haven't actually tried it. I don't find it like that at all.
I don't need to try pair programming because I know how that level of constant social interaction makes me feel.
> Otherwise, saying it sucks without giving it a real try is akin to saying, "I went for a run and didn't lose any weight so I feel that running is exhausting with no benefit."
No, what you're doing is sort of like if you're raving about the beach, and I say I don't like bright sun, and you insist I need to try the beach to have an opinion on if I like it or not.
I wouldn't call "work" social interaction but I get ya. It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone. It is what it is, though.
> I wouldn't call "work" social interaction but I get ya.
IMHO, social interaction is anything where you interact with other people.
> It's my biggest pet peeve of this industry: it has a whole lot of people who just don't want to talk to anyone.
That's very black and white thinking. I like talking to other people, but too much of it is draining. Every day spending all-day or even a half-day working directly with someone else? No thanks.
It's not black and white because that is my whole point: you have to push through the terribleness at the beginning to start feeling the benefits, and most people aren't willing to. I'm a _massive_ introvert myself, btw. But like, I'm not trying to convince you of anything.
I agree. The main reason people give for not liking it is that they say _they_ find it exhausting. _Everyone_ finds it exhausting, at least at first. That mostly stops being the case after a while, though. It can still be tiring but it found it to be a good kind of tiring because we were getting so much done. The team I used to pair on worked incredibly quickly that we started doing 7 hour days and no one noticed (although eventually we came clean).
I find it depressing and dystopian that people are now excited about having a robot pair.
This might be true for tech companies, but the tech department I am in at a large government could absolutely architecture away >95% of 'problems' we are fixing at the end of the SDLC.
I've worked waterfall (defense) and while I hated it at the time I'd rather go back to it. Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks. Also, with so much up front planning the code practically writes itself. I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.
> Most of what's planned falls down within the first few hours of implementation.
Not my experience at all. We know what computers are capable of.
> I've worked waterfall and while I hated it at the time I'd rather go back to it. Today we move much faster but build the wrong thing or rewrite and refactor things multiple times.
My experience as well. Waterfall is like - let's think about where we want this product to go, and the steps to get there. Agile is like ADHD addled zig zag journey to a destination cutting corners because we are rewriting a component for the third time, to get to a much worse product slightly faster. Now we can do that part 10x faster, cool.
The thing is, at every other level of the company, people are actually planning in terms of quarters/years, so the underlying product being given only enough thought for the next 2 weeks at a time is a mismatch.
It’s possible to manage the quarterly expectations by saying “we can improve metric X by 10% in a quarter”. It’s often possible to find an improvement that you’re very confident of making very quickly. Depending on how backwards the company is you may need to hide the fact that the 10% improvement required a one line change after a month of experimentation, or they’ll fight you on the experimentation time and expect that one line to take 5 minutes, after which you should write lots more code that adds no value.
Agile isn’t a good match for a business that can only think in terms of effort and not learning+value. That doesn’t make agile the problem.
My experience in an agile firm was that they hired a lot of experienced people and then treated them like juniors. Actively allergic to thinking ahead.
To get around the problem that deliverables took more than a few days, actual tasks would be salami sliced down into 3 point tickets that simply delivered the starting state the next ticket needed. None of these tickets being completed was an actual user observable deliverable or something you could put on a management facing status report.
Each task was so time boxed, seniors would actively be upbraided in agile ceremonies for doing obvious next steps. 8 tickets sequentially like - Download the data. Analyze the data. Load a sample of the data. Load all the data. Ok now put in data quality tests on the data. OK now schedule the daily load of the data. OK now talk to users about the type of views/aggregations/API they want on the data. OK now do a v0 of that API.
It's sort of interesting because we have fully transitioned from the agile infantilization of seniors to expecting them to replace a team of juniors with LLMs.
> and then treated them like juniors
You shouldn't put juniors in a strict short time box either. At least not for long.
People don't grow if they can't think about the results of their work. If if your juniors can't grow, you could as well not hire any.
Heh, sounds like Goodhart's law gone wild at that place.
Yes - how to complete story points without actually solving any problems
I think the bigger issue is that Waterfall is often not "Waterfall".
Sure there's a 3000 row excel file of requirements but during development the client still sees the product or slides outlining how the product works and you still had QA that had to test stuff as you made it. Then you make changes based on that feedback.
While Agile often feels like it's lost the plot. We're just going to make something and iterate it into a product people like versus figuring out a product people will like and designing towards it.
Agile largely came about because we thought about where we wanted the product to go, and the steps to get there, and started building, and then it turned out that the way we thought we wanted to go was wrong, and all of that planning we did was completely wasted.
If you work in an environment where you definitely do know where you want the product to go, and the customer doesn't change their mind once they've seen the first working bits, then great. But I've never worked in that kind of environment.
It helps to at least write down requirements. And not requirements in that "it must use Reddis", but customer, user, performance, cost, etc requirements.
A one page requirements document is like pulling teeth apparently.
There's an abstraction level above which waterfall makes more sense, and below which [some replacement for agile but without the rituals] makes more sense.
I think Qs to ask are.. if the nature of user facing deliverable tasks are longer than a sprint, the tasks have linear dependencies, there are coordination concerns, etc
Sprints are just ritual though. The others... if you're that low I'd say you're past waterfall since you have well defined tasks.
> > Most of what's planned falls down within the first few hours of implementation.
> Not my experience at all. We know what computers are capable of.
You must not work in a field where uncertainty is baked in, like Data Science. We call them “hypotheses”. As an example, my team recently had a week-long workshop where we committed to bodies of work on timelines and 3 out of our 4 workstreams blew up just a few days after the workshop because our initial hypotheses were false (i.e. “best case scenario X is true and we can simply implement Y; whoops, X is false, onto the next idea”)
Wait, are you perhaps saying that... "it depends"? ;-)
Every single reply in this thread is someone sharing their subjective anecdotal experience..
There are so many factors involved in how work pans out beyond planning. Even a single one of us could probably tell 10 different stories about 10 different projects that all went differently.
Yeah, which is also why I tried not to* speak prescriptively, unlike some other comments in this thread…
> Today we move much faster but often build the wrong thing or rewrite and refactor things multiple times. In waterfall we move glacially but what we would build sticks.
That's an interesting observation. That's one of the biggest criticisms of waterfall: by the time you finish building something the requirements have changed already, so you have to rewrite it.
there is a difference between the requirements changing and the poor quality, quickly made implementation proves to be inadequate.
agile approaches are based on the quick implementations, redone as needed.
my favorite life cycle: 1> Start with requirements identification for the entire system. 2> Pick a subset of requirements to implement and demonstrate (or deliver) to the customer. 3> Refine the requirements as needed. 4> go to 2
The key is you have an idea of overall system requirements and what is needed, in the end, for the software you are writing. Thus the re-factoring, and re-design due to things not included in the sprint do not occur. (or occur less)
This approach also accounts for the truism that "the customer doesn't know what they want until they don't see it in the final product".
Comparing the same work done between agile and waterfall I can accept your experience of what sounds like an org with unusually effective long term planning.
However the value of agile is in the learning you do along the way that helps you see that the value is only in 10% of the work. So you’re not comparing 100% across two methodologies, you’re comparing 100% effort vs 10% effort (or maybe 20% because nobody is perfect).
Most of the time when I see unhappiness at the agile result it’s because the assessment is done on how well the plan was delivered, as opposed to how much value was created.
> I'm not convinced there's any real velocity gains in agile when factoring in all the fiddling, rewrites, and refactoring.
That’s not the point. The point is to end up with something actually useful in the end. If the artifact I deliver does not meet requirements, it does not really matter how fast I deliver it.
The reason waterfall methodology falls flat so often is not long delivery times, but ending up with completely the wrong thing.
> If the artifact I deliver does not meet requirements, it does not really matter how fast I deliver it.
I don’t know. The faster you deliver the wrong thing, the sooner you can discover your mistake and pivot.
You summarized agile. That is the whole point: short feedback cycles. You can view it as a series of short, self-regressive waterfalls.
I think it also depends on how people think. I might be able to sit can't sit in a meeting room/white board/documentation editor and come up with what the big problems is (where pain points in implementation will occur, where a sudden quadratic algorithm pops up, where a cache invalidation becomes impossible, ...) even if I stare at this white board or discuss with my peers for days.
But when I hammer out the first 30 minutes of code, I have that info. And if we just spent four 2-hour meetings discussing this design, it's very common that I after 30 minutes of coding either have found 5 things that makes this design completly infeasible, or maybe 2 things that would have been so good to know before the meeting, that the 8 hours of meetings just should not have happened.
They should have been a single 2 hour meeting, followed by 30 minutes of coding, then a second 2 hour meeting to discuss the discoveries. Others might be much better than me of discovering these things at the design stage, but to me coding is the design stage. It's when I step back and say "wait a minute, this won't work!".
Agile is for when you don't know what you're making and you're basically improvising. People forget that.
Correct, and it was applied top-down to teams that do larger infrastructure / implementations in known areas / etc.
There are costs to pouring out a cement foundation without thinking through how many floors your building is going to be in advance.
But if you don't know what you are making, it is the only option!
“Everyone has a plan until they get punched in the mouth" - Mike Tyson
I've seen engineers I respect abandon this way of working as a team for the productivity promise of conjuring PRs with a coding agent. It blows away years of trust so quickly when you realize they stopped reviewing their own output.
Perhaps due to FOMO outbreak[1], upper management everywhere has demanded AI-powered productivity gains, based on LoC/PR metrics, it looks like they are getting it.
1. The longer I work in this industry, the more it becomes clear that CxO's aren't great at projecting/planning, and default to copy-cat, herd behaviors when uncertain.
Software engineers are pushed to their limits (and beyond). Unrealistic expectations are established by Twitter "I shipped an Uber clone in 2 hours with Claude" forcing every developer to crank out PRs, managers are on the look out for any kind of perceived inefficiency in tools like GetDX and Span.
If devs are expected to ship 10x faster (or else!), then they will find a way to ship 10x faster.
I always found it weird how most management would do almost anything other than ask their dev team "hey, is there any way to make you guys more productive?"
Ive had metrics rammed down my throat, Ive had AI rammed down my throat, Scrum rammed down my throad and Ive had various other diktats rammed down my throat.
95% of which slowed us down.
The only time ive been asked is when there is a deadline and it's pretty clear we arent going to hit it and even then they're interested in quick wins like "can we bring lunch to you for a few weeks?", not systemic changes.
The fastest and most productive times have been when management just set high level goals and stopped prodding.
Im convinced that the companies which seek developer autonomy will leave the ones which seek to maximize token usage in the dust in the next tech race.
Would love to be a fly on the wall for a couple of months to see what corporate CxO's actually do.
Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin. Probably wouldn't be a massively successful one, but good enough to survive 2 years and have millions in the bank for that, or get fired and get a golden parachute.
(half) joking - most likely I'm massively trivializing the role.
Funny enough, the author of this blog post wrote another one on exactly that topic, entitled "What do executives do, anyway?"[1]. If you read it, you'll find it's written from quite an interesting perspective, not quite "fly on the wall," but perhaps as close as you're going to get in a realistic scenario.
[1]: https://apenwarr.ca/log/20190926
"Surely I could do a mediocre job as a CxO by parroting whatever is hot on Linkedin"
Having worked for a pretty decent CIO of a global business I'd say his main job was to travel about speak to other senior leaders and work out what business problems they had and try and work out, at a very high level, how technology would fit into that addressing those problems.
Just parroting latest technology trends would, I suspect, get you sacked within a few weeks.
A charitable explanation for what CxOs do is that they figure out their strategic goals and then focus really hard on ways to herd cats en masse to achieve the goals in an efficient manner. Some people end up doing a great job, some do so accidentally, other just end up doing a job. Sometimes parroting some linkadink drivel is enough to keep the ship on course - usually because the winds are blowing in the right direction or the people at the oars are working well enough on their own.
Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
Agents are getting really good, and if you're used to planning and designing up front you can get a ton of value from them. The main problem with them that I see today is people having that level of trust without giving the agent the context necessary to do a good job. Accepting a zero-shotted service to do something important into your production codebase is still a step too far, but it's an increasingly small step.
>> Putting too much trust in an agent is definitely a problem, but I have to admit I've written about a dozen little apps in the past year without bothering to look at the code and they've all worked really well. They're all just toys and utilities I've needed and I've not put them into a production system, but I would if I had to.
I have been doing this to, and I've forgotten half of them. For me the point is that this usage scenario is really good, but it also has no added value to it, really. The moment Claude Code raises it prices 2x this won't be viable anymore, and at the same time to scale this to enterprise software production levels you need to spend on an agent probably as much as hiring two SWEs, given that you need at least one to coordinate the agents.
Deepseek v3.2 tokens are $0.26/0.38 on OpenRouter. That model - released 4 months ago - isn't really good enough by today's standards, but its significantly stronger than Opus 4.1, which was only released last August! In 12 months I think its reasonable to expect there will be a model with less cost than that which is significantly stronger than anything available now.
And no, it isn't ONLY because VC capital is being burned to subsidize cost. That is impossible for the dozen smaller providers offering service at that cost on OpenRouter who have to compete with each other for every request and also have to pay compute bills.
Qwen3.5-9B is stronger than GPT-4o and it runs on my laptop. That isn't just benchmarks either. Models are getting smaller, cheaper and better at the same time and this is going to continue.
I think Claude could raise it's prices 100x and people would still use it. It'd just shift to being an enterprise-only option and companies would actually start to measure the value instead of being "Whee, AI is awesome! We're definitely going really fast now!"
100x? You think people would pay $20k per month for Claude Code?
Codex is as good (or very nearly) as Claude code. Open source models continue to improve. The open source harnesses will also continue to improve. Anthropic is good, but it has no moat. No way could they 100x their prices.
I’m so disappointed to see the slip in quality by colleagues I think are better than that. People who used to post great PRs are now posting stuff with random unrelated changes, little structs and helpers all over the place that we already have in common modules etc :’(
> little structs and helpers all over the place that we already have in common modules
I've often wondered about building some kind of automated "this codebase already has this logic" linter
Not sure how it would actually work, otherwise I'd build it. But it would definitely be useful
Maybe an AI tool could do something like that nowadays. "Search this codebase for instances of duplicated functions and list them out" sort of thing
>this codebase already has this logic
At first glance this looks like it might be the halting problem in disguise (instead of the general function of the logic, just ask if they both have logic that halts or doesn't halt). I think we would need to allow for false negatives to even be theoretically possible, so while identical text comparison would be easy enough, anything past that can quickly becomes complicated and you can probably infinitely expand the complexity by handling more and more edge cases (but never every edge case due to the underlying halting problem/undecidability of code).
This is the part that doesn't get talked about enough. Code review was never just about catching bugs it was how teams built shared understanding of the codebase. When someone skips reviewing their own AI-generated PR, they're not just shipping unreviewed code, they're opting out of knowing what's in their own system. The trust problem isn't really about the AI output quality, it's about whether the person submitting it can answer questions about it six months from now.
That's partly the point of the article, except the article acknowledges that this is organizationally hard:
> You get things like the famous Toyota Production System where they eliminated the QA phase entirely.
> [This] approach to manufacturing didn’t have any magic bullets. Alas, you can’t just follow his ten-step process and immediately get higher quality engineering. The secret is, you have to get your engineers to engineer higher quality into the whole system, from top to bottom, repeatedly. Continuously.
> The basis of [this system] is trust. Trust among individuals that your boss Really Truly Actually wants to know about every defect, and wants you to stop the line when you find one. Trust among managers that executives were serious about quality. Trust among executives that individuals, given a system that can work and has the right incentives, will produce quality work and spot their own defects, and push the stop button when they need to push it.
> I think we’re going to be stuck with these systems pipeline problems for a long time. Review pipelines — layers of QA — don’t work. Instead, they make you slower while hiding root causes. Hiding causes makes them harder to fix.
>The expectation that you'll discover bugs and architecture and design problems doesn't exist if you've already agreed with the team what you're going to build.
This is like, there's not going to be surprise on the road you'll take if you already set the destination point. Though most of the time, you are just given a vague description of the kind of place you want to reach, not a precise point targeted. And you are not necessarily starting with a map, not even an outdated one. Also geological forces reshape the landscape at least as fast as you are able to move.
>shift the reviews far to the left, and call them code design sessions instead, and you raise problems on dailys, and you pair programme through the gnarly bits
hell in one sentence
I have seen the future, and it is a robotic boot pushing a human neck to the left.
Master planning has never worked for my side projects unless I am building the exact replica of what I've done in the past. The most important decisions are made while I'm deep in the code base and I have a better understanding of the tradeoffs.
I think that's why startups have such an edge over big companies. They can just build and iterate while the big company gets caught up in month-long review processes.
Bean counters do not like pair programming.
If we hired two programmers, the goal was to produce twice the LOC per week. Now we are doing far less than our weekly target. Does not meet expectation.
> You also need to build a team that you can trust to write the code you agreed you'd write
I tell every hire new and old “Hey do your thing, we trust you. Btw we have your phone number. Thanks”
Works like a charm. People even go out of their way to write tests for things that are hard to verify manually. And they verify manually what’s hard to write tests for.
The other side of this is building safety nets. Takes ~10min to revert a bad deploy.
> The other side of this is building safety nets. Takes ~10min to revert a bad deploy.
Does it? Reverting a bad deploy is not only about running the previous version.
Did you mess up data? Did you take actions on third party services that that need to be reverted? Did it have legal reprecursions?
> Does it? Reverting a bad deploy is not only about running the previous version.
It does. We’ve tried. No it’s not as easy as running the previous version.
I have written about this: https://swizec.com/blog/why-software-only-moves-forward/
I read the article and to be honest I don't know where we disagree. I disagree with this quote,
> Takes ~10min to revert a bad deploy
A bad deploy can take way over that just in customer or partner management communication.
Having data model changes be a part of regular deployments would give me persistent heartburn.
It's why you always have a rollback plan. Every `up` needs to a `down`.
If you do that, it expands your test matrix quadratically.
So, it makes sense if you have infinite testing budgets.
Personally, I prefer exhaustively testing the upgrade path, and investing in reducing the time it takes to push out a hot fix. Chicken bits are also good.
I haven’t heard of any real world situations where supporting downgrades of persistent formats led to best of class product stability.
Would love to hear of an example.
> I tell every hire new and old “Hey do your thing, we trust you. Btw we have your phone number. Thanks”
That's cool. Expect to pay me for the availability outside work hours. And extra when I'm actually called
> Expect to pay me for the availability outside work hours.
We pay people enough to care about the software they ship.
Don’t want to be called outside of work hours? Make sure your code works. Simple.
How does the phone number help?
That's the polite version of "we know where you live". Telling someone you have their phone number is a way of saying "we'll call you and expect immediacy if you break something."
Wanna be treated like an adult? Cool. You'll also be held accountable like an adult.
Never received a phone call at 5am on a Sunday because a bug is causing a valued customer to lose $10k/minute, and by the way, the SVP is also on the line? Lucky bastard
Presumably they will be contacted if there's a problem. So the hire has an interest in not creating problems.
I've seen this mentioned a couple times lately, so I want to say I don't believe pair programming can serve in place of code review.
Code review benefits from someone coming in fresh, making assumptions and challenging those by looking at the code and documentation. With pair programming, you both take the same logical paths to the end result and I've seen this lead to missing things.
This is also the premise of pair programming/extreme programming: if code review is useful, we should do it all the time.
Anyone who talks about pair programming has either never done them or just started doing them last week.
My sense is that there is a narrow slice of software developers who genuinely do flourish in a pair programming environment. These are people who actually work through their thoughts better with another person in the loop. They get super excited about it and make the common mistake of "if it works for me, it will work for everybody" and shout it from the hilltops.
Then there are the people who program best in a fugue state and the idea of having to constantly break that to transform their thoughts into words and human interaction is anathema.
I say this as someone who just woke up in the wee hours of the morning when nobody else is around so I can get some work done (:
I hope you mean "flow state" and not actually "fugue state".
Well, I wrote what I meant, but I meant to be facetious (:
I worked for five years at a shop where a few years in we started pair programming aggressively. One of our most experienced engineers was really into XP and agile work (in the “purer” meaning of the term). He often suggested pairing when thorny problems came up, and eventually it spread. It often took half or more of the available time for programming each day. That was by far the best working environment I’ve been in. The team was excellent and it seems like we all improved in our skills when we started doing it more. We cut down on how long it took to get in while managing to produce better code. It made planning features and adjusting to unforeseen snags in plans so much quicker. I can’t emphasize enough how much of an impact it made on me as a developer or how much I miss it.
The biggest downside to me was that it forces a level of engagement exceeding the most heads down solo work I’ve done. I’d come home and feel mentally exhausted in a way I didn’t usually.
I like pair programming for certain problems: things that are genuinely hard / pushing the boundaries of both participants knowledge and abilities. In those scenarios sometimes two minds can fill in each other's gaps much more efficiently than either can work alone.
I like pair programming. Not everytime or even everyday, but to shadow a junior a few hours a week, or to work with another senior on a complex/new subject? It's fine.
Unless you're covering 100% of edge/corner cases during planning (including roughly how they're handled) then there is still value in code reviews.
You conveniently brushed this under the rug of pair programming but of the handful of companies I've worked at, only one tried it and just as an experiment which in the end failed because no one really wanted to work that way.
I think this "don't review" attitude is dangerous and only acceptable for hobby projects.
Reviews are vital for 80% of the programmers I work with but I happily trust the other 20% to manage risk, know when merging is safe without review, and know how to identify and fix problems quickly. With or without pairing. The flip side is that if the programmer and the reviewer are both in the 80% then the review doesn’t decrease the risk (it may even increase it).
"If you can get the team to that level you can stop doing code reviews."
IMHO / IME (over 20y in dev) reviewing PRs still has value as a sanity check and a guard against (slippery slope) hasty changes that might not have received all of the prior checks you mentioned. A bit of well-justified friction w/ ROI, along the lines of "slow is smooth, smooth is fast".
This seems to be a core of the problem with trying to leave things to autonomous agents .. The response to Amazons agents deleting prod was to implement review stages
https://blog.barrack.ai/amazon-ai-agents-deleting-production...
actually you don't need reviews if you have a realistic enough simulation test environment that is fully instrumentable by the AI agent. If you can simulate it almost exactly as in production and it works, there's no need to code review.
to move to the hyperspeed timescale you need reliable models of verification in the digital realm, fully accessible by AI.
I'm in a company that does no reviews and I'm medior. The tools we make is not interesting at all, so it's probably the best position I could ask for. I occasionally have time to explore some improvements, tools and side projects (don't tell my boss about that last one)
Then you spend all your budget on code design sessions and have nothing to show to the customer.
Linting isn't going to catch most malicious implementation patterns. You still need to sniff test what was written.
yes!
and it also works for me when working with ai. that produces much better results, too, when I first so a design session really discussing what to build. then a planning session, in which steps to build it ("reviewability" world wonder). and then the instruction to stop when things get gnarly and work with the hooman.
does anyone here have a good system prompt for that self observance "I might be stuck, I'm kinda sorta looping. let's talk with hooman!"?
Anybody has idea on how to avoid childish resistance? Anytime something like this pops up people discuss it into oblivion and teams stay in their old habits
Okay but Claude is a fucking moron.
> your reviews are there to check someone has done their job well enough then you have bigger problems
Welcome to working with real people. They go off the rails and ignore everything you’ve agreed to during design because they get lazy or feel schedule pressure and cut corners all the time.
Sideline: I feel like AI obeys the spec better than engineers sometimes sigh.
Well we can't not review things, because the workflow demands we review things. So we hacked the process and for big changes we begin by asking people who will be impacted (no-code review), then we do a pre-review of a rough implementation and finally do a formal review in a fraction of the time.
I never review PRs, I always rubber-stamp them, unless they come from a certified idiot:
1. I don't care because the company at large fails to value quality engineering.
2. 90% of PR comments are arguments about variable names.
3. The other 10% are mistakes that have very limited blast radius.
It's just that, unless my coworker is a complete moron, then most likely whatever they came up with is at least in acceptable state, in which case there's no point delaying the project.
Regarding knowledge share, it's complete fiction. Unless you actually make changes to some code, there's zero chance you'll understand how it works.
Do people really argue about variable names? Most reviews comments I see are fairly trivial, but almost always not very subjective. (Leftover debug log, please add comment here, etc) Maybe it helps that many of our seniors are from a team where we had no auto-formatter or style guide at all for quite a while. I think everyone should experience that a random mix of `){` and `) {` does not really impact you in any way beyond the mild irking of a crooked painting or something. There's a difference between aesthetically bothersome and actually harmful. Not to say that you shouldn't run a formatter, but just for some perspective.
>Do people really argue about variable names?
Of course they do. A program's code is mostly a graph of names; they can be cornerstones of its clarity, or sources of confusion and bugs.
The first thing I do when debugging is ensuring proper names, sometimes that's enough to make the bug obvious.
The greatest barrier to understanding is not lack of knowledge but incorrect knowledge. That's why good names matter. And naming things is hard, which is why it makes sense to comment on variable names in a review.
Unless the naming convention were written in the 90s and all variable must follow a precise algorithm to be made of only abbreviation and a maximum length of 15.
Or for some, if it contains the value of a column in the db, it must have the same name as the column.
So yeah, instead of "UsualQuantityOrder", you get "UslQtyOrd" or "I_U_Q_O"... And you must maintain the comments to explain what the field is supposed to contain.
I have seen this mostly on teams which refuse to formalize preferences into a style guide.
I have fixed this by forcing the issue and we get together as a team, set a standard and document it. If we can use tools to enforce it automatically we do that. If not you get a comment with a link to the style guide and told to fix it.
Style is subjective but consistency is not. Having a formal style guide which is automatically enforced helps with onboarding and code review as well.
Yes. 80% of comments to my PRs are "change _ to -" or something like that.
PR #467 - Reformat code from tabs to spaces
PR #515 - Reformat code from spaces to tabs
I'm very surprised by these comments...
I regularly review code that is way more complicated that it should.
The last few days I was going back and forth on reviews on a function that had originally cyclomatic complexity of 23. Eventually I got it down to 8, but I had to call him into a pair programming session and show him how the complexity could be reduced.
Someone giving work like that should be either junior enough that there is potential for training them, so your time investment is worth it, or managed out.
Or it didn't really matter that the function was complex if the structure of what's surrounding it was robust and testable; just let it be a refactor or bug ticket later.
he is a junior yes.
I know the aggravation of getting a hairball of code to review, but I often hold my nose. At least find a better reason to send it back, like a specific bug.
If you're sure cyclomatic complexity should be minimized, I think you should put such rules in a pre-commit hook or something that runs before a reviewer ever sees the code. You should only have to help with that if someone can't figure out how to make it pass.
If you're not willing or politically able to implement that, you might be wasting time on your personal taste that the team doesn't agree with. Personally I'm pretty skeptical of cyclomatic complexity's usefulness as a metric.
I just used it here to approximately convey the scale.
the original function was full of mutable state (not required), full of special cases (not required), full of extra return statements (not required). Also had some private helper methods that were mocked in the tests (!!!).
All of this just for a "pure" function. Just immutable object in - immutable object out.
and yes, he was a junior.
I always approve a change with comments for nits that are optional to address. I only hold back approval if there is a legitimate flaw of some sort. Generally this leads to small changes almost always getting approved on the first shot, but larger changes needing at least one back and forth. AI code review tools make it much easier to spot legitimate problems these days.
> 2. 90% of PR comments are arguments about variable names.
This sort of comment is meaningless noise that people add to PRs to pad their management-facing code review stats. If this is going on in your shop, your senior engineers have failed to set a suitable engineering culture.
If you are one of the seniors, schedule a one-on-one with your manager, and tell them in no uncertain terms that code review stats are off-limits for performance reviews, because it's causing perverse incentives that fuck up the workflow.
The most senior guy has the worst reviews because it takes multiple rounds, each round finds new problems. Manager thinks this contributes to code quality. I was denied promotion because I failed to convince half of the company to drop everything and do my manager's pet project that had literally zero business value.
Yeah, I'm afraid that's an engineering culture that is thoroughly cooked. Not much choice except keep your head down until you are ready to cut your losses
That seems a lot about the company and the culture rather than about how code review is supposed to work.
I have been involved in enough code reviews both in a corporate environment and in open source projects to know this is an outlier. When code review is done well, both the author and reviewer learn from the experience.
People always makes mistakes. Like forgetting to include a change. The point of PRs for me is to try to weed out costly mistakes. Automated tests should hopefully catch most of them though.
The point of PRs is not to avoid mistakes (though sometimes this can happen). Automated tests are the tool to weed out those kinds of mistakes. The point of PRs is to spread knowledge. I try to read every PR, even if it's already approved, so I'm aware of what changes there are in code I'm going to own. They are the RSS feed of the codebase.
I used to do this! I can’t anymore, not with the advent of AI coding agents.
My trust in my colleagues is gone, I have no reason to believe they wrote the code they asked me to put my approval on, and so I certainly don’t want to be on a postmortem being asked why I approved the change.
Perhaps if I worked in a different industry I would feel like you do, but payments is a scary place to cause downtime.
As far as I'm concerned if I approved the PR I'm equally responsible for it as the author is. I never make nitpick comments and I still have to point out meaningful mistakes in around 30% of reviews. The percentage has only risen with AI slop.
These systems make it more efficient to remove the actively toxic members for your team. Beligerence can be passively aggressively “handled” by additional layers but at considerable time and emotional labor cost to people who could be getting more work done without having to coddle untalented assholes.
Sounds like there was a bad hiring process.
There's no such thing as a hiring process that avoids that problem 100% of the time.
After all, most people will be on their best behavior during an interview, and even a lengthy interview process is a very short period of time compared to working with someone for weeks or months.
The issue is that every review adds a lot of delay. A lot of alignment and pair programming won't be time expensive?
A lot of alignment and pair programming won't be time expensive?
The question is really "Will up-front design and pair programming cost more than not doing up-front design and pair programming?".
In my experience, somewhat counter-intuitively, alignment and pairing is cheaper because you get to the right answer a bit 'slower' but without needing the time spent reworking things. If rework is doubling the time it takes to deliver something (which is not an extreme example, and in some orgs would be incredibly conservative) then spending 1.5 times the estimate putting in good design and pair programming time is still waaaay cheaper.
Yes. This is the way. Declarative design contracts are the answer to A.I. coders. A team declares what they want, agents code it together with human supervision. Then code review is just answering the question "is the code conformant with the design contract?"
But. The design contract needs review, which takes time.
I wonder what delayed continuous release would be like. Trust folks to merge semi-responsibly, but have a two week delay before actually shipping to give yourself some time to find and fix issues.
Perhaps kind of a pain to inject fixes in, have to rebase the outstanding work. But I kind of like this idea of the org having responsibility to do what review it wants, without making every person have to coral all the cats to get all the check marks. Make it the org's challenge instead.
Code reviews are a volunteer’s dilemma. Nobody is showered with accolades by putting “reviewed a bunch of PRs” on their performance review by comparison with “shipped a bunch of features.” The two go hand-in-hand, but rewards follow marks of authorship despite how much reviewers influence what actually landed in production.
Consequently, people tend to become invested in reviewing work only once it’s blocking their work. Usually, that’s work that they need to do in the future that depends on your changes. However, that can also be work they’re doing concurrently that now has a bunch of merge conflicts because your change landed first. The latter reviewers, unfortunately, won’t have an opinion until it’s too late.
Fortunately, code is fairly malleable. These “reviewers” can submit their own changes. If your process has a bias towards merging sooner, you may merge suboptimal changes. However, it will converge on a better solution more quickly than if your changes live in a vacuum for months on a feature branch passing through the gauntlet of a Byzantine review and CI process.
Or the reviewer feels responsible for the output of the code from the person they are reviewing or the code they are modifying. For instance a lead on the team gets credit for the output of the team Also, wanting to catch bugs on review before they make your on call painful can be a large motivation.
It’s weird that the two tasks that most programmers would agree are most important (reviewing code and deleting code) are not heavily rewarded.
Unfortunately for programmers, programmers aren’t doing the rewarding
I've always encouraged everyone more junior to review everything regardless of who signs off, and even if you don't understand what's going on/why something was done in a particular way, to not be shy to leave comments asking for clarification. Reviewing others' work is a fantastic way to learn. At a lower level, do it selfishly.
If you're aiming for a higher level, you also need to review work. If you're leading a team or above (or want to be), I assume you'll be doing a lot of reviewing of code, design docs, etc. If you're judged on the effectiveness of the team, reviews are maybe not an explicit part of some ladder doc, but they're going to be part of boosting that effectiveness.
Valve is one of the only companies that appears to understand this, as well as that individual productivity is almost always limited by communication bandwidth, and communication burden is exponential as nodes in the tree/mesh grow linearly. [or some derated exponent since it doesn't need to be fully connected]
The first one to realise this was Jeff Bezos, afaik. One would think the others have wisened up in the meantime, but no.
> The first one to realise this was Jeff Bezos, afaik
I am not aware about the details - can you elaborate?
Maybe the Two Pizza rule:
No team at Amazon should be larger than what two pizzas can feed (usually about 6 to 10 people).
The ‘design everything as a publicly accessible API’ directive seems to play to this as well. If all your data / services are available and must be documented then a lot of communication overhead can be eliminated.
I have always been amazed at that rule because it implies developers either do not like pizza or they happen to be on a diet.
I wonder where the reviewer worked where PRs are addressed in 5 hours. IME it's measured in units of days, not hours.
I agree with him anyway: if every dev felt comfortable hitting a stop button to fix a bug then reviewing might not be needed.
The reality is that any individual dev will get dinged for not meeting a release objective.
I worked in a company where reviews took days. The CTO complained a lot about the speed, but we had decent code quality.
Now I work at a company where reviews take minutes. We have 5 lines of technical debt per 3 lines of code written. We spend months to work on complicated bugs that have made it to production.
My last FAANG team had a soft 4-hour review SLA, but if it was a complicated change then that might just mean someone acknowledging it and committing to reviewing it by a certain date/time. IIRC, if someone requested a review and you hadn't gotten to it by around the 3-hour mark you'd get an automated chat message "so-and-so has been waiting a while for your review".
Everyone was very highly paid, managers measured everything (including code review turnaround), and they frequently fired bottom performers. So, tradeoffs.
That sounds horrible. I don't know how people stand to work in those conditions.
Well, there's a reason I'm no longer working there :)
But some people will put up with a lot for half a million dollars a year.
Ahh, that would do it. I don't think I have it in me, but I get it.
Why does it sound horrible to have your code reviewed quickly? There is no reason for reviews to wait a long time. 4 hours is already a long time, it means you can wait to do it right before you go home or after lunch.
Why would I care if my code is reviewed quickly? If the answer is some variant of "I get punished if I don't have enough changes merged in fast enough," that's not helping. From the other side, it's having someone constantly breathe down your neck. Hope you don't get in a flow at the wrong time and need to break it so Mr. Lumbergh doesn't hit you up on Teams. It just reeks of a culture of "unlimited pto," rigid schedules, KPI hacking, and burnout.
Because it's basically async pair-programming.
You do a lot of small changes (<100 loc) that get reviewed often. If it doesn't get reviewed often then the whole idea of continuous development breaks down.
Argueable you have 8 hours of work a day. How many of them do you need to write 100 loc? After that 100 loc or maybe 200 take a break and review other people's code.
Plus you also have random meetings and stuff so your day already fragments itself so adding a code review in the time before a meeting or after is "free" from a fragmentation standpoint.
> Argueable you have 8 hours of work a day. How many of them do you need to write 100 loc?
I have an issue at work that will likely be solved by a single line change. But figuring out which line it is is going to take awhile.
IMO code reviews are not pair programming. By the time I've raised an MR, it's already perfect. I've had multiple client calls, talked to my team about design, unit tested it, tested it on a container environment, thought about it.
So it really doesn't matter when the review gets done. I mean, even a week and it's fine.
It sounds horrible to be interrupted constantly. I can't imagine they'd be particularly thorough reviews
Constantly? Some people can take a break in the morning and review a few PR’s and some in the afternoon. No one needs to drop what they’re doing.
Sounds kind of amazing to me. 4 hours is a bit ridiculous, but I wish we had some kind of automated system to poke people about reviews so I don't have to. It's doubly bad because a) I have to do it, and b) it makes me look annoying.
My ideal system (for work) would be something like: after 2 days, ask for a review if the reviewer hasn't given it. After a week, warn them the PR will be auto-approved. After 2 weeks, auto-approve it.
At the bottom of the page it says he is CEO of Tailscale.
I'm yet to see a project where reviews are handled seriously. Both business and developers couldn't care less.
I worked somewhere that actually had a great way to deal with this. It only works in small teams though.
We had a "support rota", i.e. one day a week you'd be essentially excused from doing product delivery.
Instead, you were the dev to deal with big triage, any code reviews, questions about the product, etc.
Any spare time was spent looking for bugs in the backlog to further investigate / squash.
Then when you were done with your support day you were back to sprint work.
This meant there was no ambiguity of who to ask for code review, and limited / eliminated siloing of skills since everyone had to be able to review anyone else's work.
That obviously doesn't scale to large teams, but it worked wonders for a small team.
I have, and in each sprint we always had tickets for reviewing the implementation, which could take anywhere from an hour to 2 days.
The code quality was much better than in my current workplace where the reviews are done in minutes, although the software was also orders of magnitude more complex.
Bonus points: reviews are not taken seriously in the legitimate sense, but a facade of seriousness consisting of picky complaints is put forth to reinforce hierarchy and gatekeeping
I’ve worked on teams like you describe and it’s been terrible. My current team’s SDLC is more along the 5-hour line - if someone hasn’t reviewed your code by the end of today, you bring it up in standup and have someone commit to doing it.
One thing that often gets dismissed is the value/effort ratio of reviews.
A review must be useful and the time spent on reviewing, re-editing, and re-reviewing must improve the quality enough to warrant the time spent on it. Even long and strict reviews are worth it if they actually produce near bugless code.
In reality, that's rarely the case. Too often, reviewing gets down into the rabbithole of various minutiae and the time spent to gain the mutual compromise between what the programmer wants to ship and the reviewer can agree to pass is not worth the effort. The time would be better spent on something else if the process doesn't yield substantiable quality. Iterating a review over and over and over to hone it into one interpretation of perfection will only bump the change into the next 10x bracket in the wallclock timeline mentioned in this article.
In the adage of "first make it work, then make it correct, and then make it fast" a review only needs to require that the change reaches the first step or, in other words, to prevent breaking something or the development going into an obviously wrong direction straight from the start. If the change works, maybe with caveats but still works, then all is generally fine enough that the change can be improved in follow-up commits. For this, the review doesn't need to be thorough details: a few comments to point the change into the right direction is often enough. That kind of reviews are very efficient use of time.
Overall, in most cases a review should be a very short part of the development process. Most of the time should be spent programming and not in review churn. A review serves as a quick check-point that things are still going the right way but it shouldn't dictate the exact path that should be used in order to get there.
> The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore
Amen brother
> The job of a code reviewer isn't to review code. It's to figure out how to obsolete their code review comment, that whole class of comment, in all future cases, until you don't need their reviews at all anymore.
Making entire classes of issues effectively impossible is definitely the ideal outcome. But, this feels much more complicated when you consider that trust doesn't always extend beyond the company's wall and you cannot always ignore that fact because the negative outcomes can be external to the company.
What if I, a trusted engineer, run `npm update` at the wrong time and malware makes its way into production and user data is stolen? A mistake to learn from, for sure, but a post-mortem is too late for those users.
I'm certainly not advocating for relying on human checks everywhere, but reasoning about where you crank the trust knob can get very complicated or costly. Occasionally a trustworthy human reviewer can be part of a very reasonable control.
Nice piece, and rings true. I also think startups and smaller organizations will be able to capture better value out of AI because they simply don't have all those approval layers.
I do not think you have comprehended the blog.
> you can’t overcome latency with brute force
Curious what rang true to you if not the main point?
The approval tree grows logarithmically as the size of the company grows. A startup can win initially because they may have zero or one level to get to production. That's part of how they manage to get inside the OODA loop of much bigger companies.
The flip side of that, and why the software world is not a complex network of millions of tiny startups but in fact has quite a few companies where log(organization) >= 2, is that there are a lot of tasks that are just larger than a startup, and the log of the minimum size organization that can do the job becomes 2 or 3 or 4.
There is certainly at least the possibility that AI can really enhance those startups even faster, but it also means that they'll get to the point that they need more layers more quickly, too. Since AI can help much, much more with coding than it can with the other layers (not that it can't help, but at the moment I don't think there's anybody else in the world getting the advantages from AI that programmers are getting), it can also result in the amount of time that startups can stay in the log(organization)=1 range shrink.
(Pardon the sloppy "log(organization)" notation. It should not be taken too literally.)
I think this makes an assumption early on which is that things are serialized, when usually they are not.
If I complete a bugfix every 30 minutes, and submit them all for review, then I really don't care whether the review completes 5 hours later. By that time I have fixed 10 more bugs!
Sure, getting review feedback 5 hours later will force me to context switch back to 10 bugs ago and try to remember what that was about, and that might mean spending a few more minutes than necessary. But that time was going to be spent _anyway_ on that bug, even if the review had happened instantly.
The key to keeping speed up in slow async communication is just working on N things at the same time.
Excellent article. Based on personal experience, if you build cutting edge stuff then you need great engineers and reviewers.
But for anything else, you just need an individual (not a team) who's okay (not great) at multiple things (architecting, coding, communicating, keeping costs down, testing their stuff). Let them build and operate something from start to finish without reviewing. Judge it by how well their produce works.
Overall, this is pretty accurate. Of course, it’s a range at every level, say 5x-15x. Large companies trend toward 15x and startups toward 5x, which is why startups out-execute large companies. Also, they just skip some levels of review because, for instance, the CEO is sitting in a code review meeting. But yea, the average is close.
about a year ago I shared this on /r/AskProgramming:
"...a Pull Request is a delivery. It's like UPS standing at your door with a package. You think, "Nice, the feature, bugfix, etc has arrived! And because it's a delivery, it's also an inspection. A Code Review. Like a freight delivery with a manifest and signoff. So you have to be able to conduct the inspection: to understand what you're receiving and evaluate if it's acceptable as-is. Like signing for a package, once you approve, the code is yours and your team's to keep."
The metaphor has limits. IRL I sign immediately and resolve issues post-hoc with customer service. The UPS guy is not going to stand on my porch while I check if there's actually a bootable MacBook in the box. The vast majority of the time, there's no issue. If that were the same with code, teams could adopt a similar "trust now and defer verification" approach.
The article has a section on Modularity but never defines it. I wrote a post a few weeks ago on modularity and LLMs which does provide a definition. [1].
[1] https://www.slater.dev/2026/02/relieve-your-context-anxiety-...
Not before coding agents nor after coding agents has any PR taken me 5 hours to review. Is the delay here coordination/communication issues, the "Mythical Mammoth" stuff? I could buy that.
The article is referring to the total time including delays. It isn’t saying that PR review literally takes 5 hours of work. It’s saying you have to wait about half a day for someone else to review it.
Which is a thing that depend very much on team culture. In my team it is perhaps 15 min for smaller fixes to get signoff. There is a virtuous feedback loop here - smaller PRs give faster reviews, but also more frequent PRs, which give more frequent times to actually check if there is something new to review.
If I'm deep in coding flow the last thing I'm going to do is immediately jump on to someone else's PR. Half a day to a day sounds about right from when the PR is submitted to actually getting the green light
Does your team just context switch all the time? That sounds like a terrible place to work.
Similar in my team and I don't feel like there's much context switching. With around 8 engineers there's usually at least one person not in the middle of something who can spare a few minutes.
How can everyone be familiar with everybody else's work?
Usually at most 2-3 engineer have enough context to fully understand what your code is doing.
The PR won’t take 5 hours of work, but it could easily sit that long waiting for another engineer to willing to context switch from their own heads-down work.
Exactly. Even if I hammer the erstwhile reviewer with Teams/Slack messages to get it moved to the top of the queue and finished before the 5 hours are up, then all the other reviews get pushed down. It averages out, and the review market corrects.
Exaxtly. Can you get a lawyer on the phone now or do you wait ~ 5 hours. How about a doctor appt. Or a vet appt. Or a mechanic visit.
Needing full human attention on a co.plex task from a pro who can only look at your thing has a wait time. It is worse when there are only 2 or 3 such people in the world you can ask!
The article specified wall clock time. One day turnaround is pretty typical if its not urgent enough to demand immediate review, lots of people review incoming PRs as a morning activity.
I've had PRs that take me five hours to review. If your one PR is an entire feature that touches the database, the UI, and an API, and I have to do the QA on every part of it because as soon as I give the thumbs up it goes out the door to clients? Then its gonna take a while and I'm probably going to find a few critical issues and then the loop starts again
Some devs interrupt what they are doing when they see a PR in a Slack notification, most don't.
Most devs set aside some time at most twice a day for PRs. That's 5 hours at least.
Some PRs come in at the end of the day and will only get looked at the next day. That's more than 5 hours.
IME it's rare to see a PR get reviewed in under 5 hours.
I use a PR notifier chrome extension, so I have a badge on the toolbar whenever a PR is waiting on me. I get to them in typically <2 minutes during work hours because I tab over to chrome whenever AI is thinking. Sometimes I even get to browse HN if not enough PRs are coming and not too many parallel work sessions.
But there's more than one person that can review a PR.
If you work in a team of 5 people, and each one only reviews things twice a day, that's still less than 5 hours any way you slice it.
> "Mythical Mammoth"
Most excellent.
Man moth?
One pattern I've seen is that a team with a decently complex codebase will have 2-3 senior people who have all of the necessary context and expertise to review PRs in that codebase. They also assign projects to other team members. All other team members submit PRs to them for review. Their review queue builds up easily and average review time tanks.
Not saying this is a good situation, but it's quite easy to run into it.
Managers are expected to say that we should be productive yet they're responsible for the framework which slows down everyone and it's quite clear that they're perfectly fine with this framework. I'm not saying it's good or bad because it's complicated.
A few years ago there was a thread about "How complex systems fail" here on HN[1], and one aspect of it (rule 9) is about how individuals have to balance between security and productivity, and being judged differently depending on the context (especially being judged after-the-fact for the security aspect, while being judged before the accident for the productivity aspect).
The linked page in the thread is short and quite enlightening, but here is the relevant passage:
[1] https://news.ycombinator.com/item?id=32895812I find to be true for expensive approvals as well.
If I can approve something without review, it’s instant. If it requires only immediate manager, it takes a day. Second level takes at least ten days. Third level trivially takes at least a quarter (at least two if approaching the end of the fiscal year). And the largest proposals I’ve pushed through at large companies, going up through the CEO, take over a year.
That’s because most teams are doing engineering wrong.
The handover to a peer for review is a falsehood. PRs were designed for open source projects to gate keep public contributors.
Teams should be doing trunk-based development, group/mob programming and one piece flow.
Speed is only one measure and AI is pushing this further to an extreme with the volume of change and more code.
The quality aspect is missing here.
Speed without quality is a fallacy and it will haunt us.
Don’t focus on speed alone, and the need to always be busy and picking up the next item - focus on quality and throughput keeping work in progress to a minimum (1). Deliver meaningful reasoned changed as a team, together.
Communication overhead is the #1 schedule killer, in my experience.
Whenever we have to talk/write about our work, it slows things down. Code reviews, design reviews, status updates, etc. all impact progress.
In many cases, they are vital, and can’t be eliminated, but they can be streamlined. People get really hung up on tools and development dogma, but I've found that there’s no substitute for having experienced, trained, invested, technically-competent people involved. The more they already know, the less we have to communicate.
That’s a big reason that I have for preferring small meetings. I think limiting participants to direct technical members, is really important. I also don’t like regularly-scheduled meetings (like standups). Every meeting should be ad hoc, in my opinion.
Of course, I spent a majority of my career, at a Japanese company, where meetings are a currency, so fewer meetings is sort of my Shangri-La.
I’m currently working on a rewrite of an app that I originally worked on, for nearly four years. It’s been out for two years, and has been fairly successful. During that time, we have done a lot of incremental improvements. It’s time for a 2.0 rewrite.
I’ve been working on it for a couple of months, with LLM assistance, and the speed has been astounding. I’m probably halfway through it, already. But I have also been working primarily alone, on the backend and model. The design and requirements are stable and well-established. I know pretty much exactly what needs to be done. Much of my time is spent testing LLM output, and prompting rework. I’m the “review slowdown,” but the results would be disastrous, if I didn’t do it.
It’s a very modular design, with loosely-coupled, well-tested and documented components, allowing me to concentrate on the “sharp end.” I’ve worked this way for decades, and it’s a proven technique.
Once I start working on the GUI, I guarantee that the brakes will start smoking. All because of the need for non-technical stakeholder team involvement. They have to be involved, and their involvement will make a huge difference (like a Graphic UX Designer), but it will still slow things down. I have developed ways to streamline the process, though, like using TestFlight, way earlier than most teams.
Reviewing things is fast and smooth is things are small. If you have all the involved parties stay in the loop, review happens in the real time. Review is only problematic if you split the do and review steps. The same applies to AI coding, you can chose to pair program with it and then it's actually helpful, or you can have it generate 10k lines of code you have no way of reviewing. You just need people understand that switching context is killing productivity. If more things are happening at the same time and your memory is limited, the time spent on load/save makes it slower than just doing one thing at the time and staying in the loop.
Honestly if I'm just following what a single LLM is doing I'm arguably slower than doing it myself so I'd say that approach isn't very useful for me.
I prefer to review plan (this is more to flush out my assumptions about where something fits in the codebase and verify I communicated my intent correctly).
I'll loosely monitor the process if it's a longer one - then I review the artifacts. This way I can be doing 2/3 things in parallel, using other agents or doing meetings/prod investigation/making coffee/etc.
This is very true in marketing and advertising as well. A campaign where the channel manager can just test ads within a general framework will do ten times better than a campaign that has to go through review processes.
https://vekthos.com/papers/cognitive-sight-theory.pdf
Solution: Feed this paper to the llm and ask it to solve your problem. Then contact me with your experience. XD
The 10x estimate tracks — I've seen it too. The underlying mechanism is queuing theory: each approval step is a single-server queue with high variance inter-arrival times, so average wait explodes non-linearly. AI makes the coding step ~10x faster but doesn't touch the approval queue. The orgs winning right now are the ones treating async review latency as a first-class engineering metric, same way they treat p99 latency for services.
Well, this all makes sense for application code, but not necessarily for infrastructure changes. Imagine a failed Terraform merge that deletes the production database but opens the inbound at 0.0.0.0/0, and you can't undo it for 10 minutes. In my opinion, you need to pay attention to the narrow scope specific to a given project.
Try to imagine a deployment/CI system where that isn't possible. That's what the post is asking.
* Maybe you don't have privileges to delete the database
* Maybe your CI environments are actually high fidelity, and will fail when there is no DB
* Maybe destructive actions require further review
* Maybe your service isn't exposed to the public internet, and exposing to 0.0.0.0/0 isn't a problem.
* Maybe we engineer our systems to have trivial instant undo, and deleting a DB triggers an undo
Our tooling is kind of crappy. There's a lot we can do.
I broadly agree with this, it really is all about trust. Just, as a company scales it’s hard to make sure that everybody in the team remains trustworthy – it isn’t just about personality and culture, it’s also about people actually having the skill, motivation, and track record of doing good work efficiently. Maybe AI‘s greatest value will be to allow teams to stay small, which reduces the difficulty of maintaining trust.
It’s also the case that someone you trust makes an honest mistake and, for example, gets their laptop stolen and their credentials compromised. I do trust my team, and want that to be the foundation to our relationship, but I also recognize that humans are infallible and having guardrails (eg code review) is beneficial.
> only >> recenlty << started happening
can't believe I was baited into reading this slop
/jk
good post actually, and a fair point
I do think many people will argue that you can just not review things though.
That's exactly why I think vibecoding uniquely benefits solo and small team founders. For anything bigger, work is not the bottleneck, it's someone's lack of imagination.
https://capocasa.dev/the-golden-age-of-those-who-can-pull-it...
Yes there's more red tape the larger you get but there's also working product(s) that when they're broken you stop making money.
See recent Amazon outages caused by vibe/slop/movefast coding practices with little review.
In my experience a culture where teammates prioritise review times (both by checking on updates in GH a few times a day, and by splitting changes agressively into smaller patches) is reflected in much faster overall progress time. It's definitely a culture thing, there's nothing technically or organisationally difficult about implementing it, it just requires people working together considering team velocity more important than personal velocity.
Let's say a teammate is writing code to do geometric projection of streets and roads onto live video. Another teammate is writing code to do automated drone pursuit of cars. Let's say I'm over here writing auth code, making sure I'm modeling all the branches which might occur in some order.
To what degree do we expect intellectual peerage from someone just glancing into this problem because of a PR? I would expect that to be the proper intellectual peer of someone studying the problem, it's quite reasonable to basically double your efforts.
If the team is that small and working on things that are that disparate, then it is also very vulnerable to one of those people leaving, at which point there's a whole part of the project that nobody on the team has a good understanding of.
Having somebody else devote enough time to being up to speed enough to do code review on an area is also an investment in resilience so the team isn't suddenly in huge difficulty if the lone expert in that area leaves. It's still a problem, but at least you have one other person who's been looking at the code and talking about it with the now-departed expert, instead of nobody.
This is an unusually low overlap per topic; probably needs a different structure to traditional prs to get the best chance to benefit from more eyes... Higher scope planning or something like longer but intermittent partner programming.
Generally if the reviewer is not familiar with the content asynchronous line by line reviews are of limited value.
> Code a simple bug fix 30 minutes
> Get it code reviewed by the peer next to you 300 minutes → 5 hours → half a day
Is it takes 5 hours for a peer to review a simple bugfix your operation is dysfunctional.
Its rare that devs are on standby, waiting for a pr to review. Usually they are working on their own pr, are in meetings, have focus time.
We talked a lot about the costs of context switches so its reasonable to finish your work before switching to the review.
Hehe I'm waiting right now, should have been reviewed yesterday but I'm like alright, I'll just chill then.
People are busy, and small bugfixes are usually not that critical. If you make everyone drop everything to review everything, that is much more dysfunctional.
nobody will immediately jump on your code review
Sure, but five hours is a lot of time, and a small fix takes little to review.
So, 1 hour? Sure. Two hours? Ok. But five hours means you only look at your teammates code once a day.
It's ok for a process where you work on something for a week and then come back for reviews but then it's silly to complain about overhead.
This is a profound point but is review really the problem or is it the handoff that crosses boundaries (me to others, our team to other team, our org to outside our org)?
This reads like a scattered mind with a few good gems, a few assumptions that are incorrect but baked into the author’s world view, and loose coherence tying it all together. I see a lot of myself in it.
I’ll cover one of them: layers of management or bureaucracy does not reduce risk. It creates in-action, which gives the appearance of reducing risk, until some startup comes and gobbles up your lunch. Upper management knows it’s all bullshit and the game theoretic play is to say no to things, because you’re not held accountable if you say no, so they say no and milk the money printer until the company stagnates and dies. Then they repeat at another company (usually with a new title and promotion).
Meanwhile there are people who, as we speak, say that AI will do review and all we need to do is to provide quality gates...
AI reviews? Sounds like a waste of tokens!
For all the people talking about 5 hour PR review delays... This reminds me of some teams that rotate the "fire extinguisher/emergency bug fixer" duty every day/week/sprint to a different developer. One could rotate a dedicated "first review duty" person. That developer would be in charge of focusing on rapidly starting PR reviews as their priority, with option to request other reviewers if necessary. Spreading the duty around would make people be respectful of the reviewer because if they send unreviewed slop to the reviewers, it's likely that people will send them slop too.
Waiting for a few days of design review is a pain that is easy to avoid: all we need is to be ready to spend a few months building a potentially useless system.
I don’t agree that AI can’t fix this. It is too easy to dismiss.
With AI my task to review is to see high level design choices and forget reviewing low level details. It’s much simpler.
I think the problem is the shape of review processes. People higher up in the corporate food chain are needed to give approval on things. These people also have to manage enormous teams with their own complexities. Getting on their schedule is difficult, and giving you a decision isn't their top priority, slowing down time to market for everything.
So we will need to extract the decision making responsibility from people management and let the Decision maker be exclusively focused on reviewing inputs, approving or rejecting. Under an SLA.
My hypothesis is that the future of work in tech will be a series of these input/output queue reviewers. It's going to be really boring I think. Probably like how it's boring being a factory robot monitor.
Are we starting to need a BuSab for programming?
A lot of this goes away when the person who builds also decides what to build.
That's great, but if I hire a random person from this thread and let them decide, chances are they would build an agent orchestrator.
> Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself.
That's me. I'm the mad reviewer. Each time I ranted against AI on this site, it was after reviewing sloppy code.
Yes, Claude Opus is better on average than my juniors/new hires. But it will do the same mistakes twice. I _need_ you to fucking review your own generated code and catch the obvious issues before you submit it to me. Please.
In my experience, good mature organisations have clear review processes to ensure quality, improve collaboration and reduce errors and risk. This is regardless of field. It does slow you down - not 10x - but the benefits outweigh the downsides in the long run.
The worst places I’ve worked have a pattern where someone senior drives a major change without any oversight, review or understanding causing multiple ongoing issues. This problem then gets dumped onto more junior colleagues, at which point it becomes harder and more time consuming to fix (“technical debt”). The senior role then boasts about their successful agile delivery to their superiors who don’t have visibility of the issues, much to the eye-rolls of all the people dealing with the constant problems.
I totally agree with his ideas, but somehow he seems just stating the obvious: startups move better than big orgs and you can solve a problem by dividing it in smaller problems - if possible. And that AI experimentation is cheap.
What makes me slower is the moment is the AI slop my team lead posts into reviews. I have to spend time to argue why that's not a valid comment.
As they say: an hour of planning saves ten hours of doing.
You don't need so much code or maintenance work if you get better requirements upfront. I'd much rather implement things at the last minute knowing what I'm doing than cave in to the usual incompetent middle manager demands of "starting now to show progress". There's your actual problem.
If an hour of planning always saved ten hours of work, software schedules would be a whiteboard exercise.
Instead everyone wants perfect foresight, but systems are full of surprises you only find by building and the cost of pushing uncertainty into docs is that the docs rot because nobody updates them. Most "progress theater" starts as CYA for management but hardens into process once the org is too scared to change anything after the owners move on.
> As they say: an hour of planning saves ten hours of doing.
In software it's the opposite, in my experience.
> You don't need so much code or maintenance work if you get better requirements upfront.
Sure, and if you could wave a magic wand and get rid of all your bugs that would cut down on maintenance work too. But in the real world, with the requirements we get, what do we do?
> In software it's the opposite, in my experience.
That's been my experience as well: ten hours of doing will definitely save you an hour of planning.
If you aren't getting requirements from elsewhere, at least document the set of requirements you think you're working towards, and post them for review. You sometimes get new useful requirements very fast if you post "wrong" ones.
I think what they meant is you “can save 10 hours of planning with one hour of doing”
And I think this has become even more so with the age of ai, because there is even more unknown unknowns, which is harder to discover while planning, but easy wile “doing” and that “doing” itself is so much more streamlined.
In my experience no amount of planning will de-risk software engineering effort, what works is making sure coming back and refactoring, switching tech is less expensive, which allows you to rapidly change the approach when you inevitably discover some roadblock.
You can read all the docs during planning phases, but you will stumble with some undocumented behaviour / bug / limitation every single time and then you are back to the drawing board. The faster you can turn that around the faster you can adjust and go forward.
I really like the famous quote from Churchill- “Plans are useless, planning is essential”
> I think what they meant is you “can save 10 hours of planning with one hour of doing”
I know what they meant, and I also meant the thing I said instead. I have seen many, many people forge ahead on work that could have been saved by a bit more planning. Not overplanning, but doing a reasonable amount of planning.
Figuring out where the line is between planning and "just start trying some experiments" is a matter of experience.
> I really like the famous quote from Churchill- “Plans are useless, planning is essential”
I really like Churchill’s second famous quote: “What the fuck is software, lol”.
Planning includes the prototype you build with AI.
>> Now you either get to spend 27 minutes reviewing the code yourself in a back-and-forth loop with the AI (this is actually kinda fun); or you save 27 minutes and submit unverified code to the code reviewer, who will still take 5 hours like before, but who will now be mad that you’re making them read the slop that you were too lazy to read yourself. Little of value was gained.
This seems to check out, and it's the reason why I can't reconcile the claims of the industry about workers replacement with reality. I still wonder when a reckoning will come, though. seems long overdue in the current environment
> I still wonder when a reckoning will come, though. seems long overdue in the current environment
Never. Until 1-10 person teams starts disrupt enterprises (legacy banks, payments systems, consultancies).
“Why” would you ask? Because it’s a house of cards. If engineers get redundant, then we don’t need teams. If we don’t need teams, then we don’t need team leads/PMs/POs and others, if we don’t need middle management, then we don’t need VPs and others. All of those layers will eventually catch up to what’s going on and kill any productivity gains via bureaucracy.
I don't agree with this take in the article. One person with Claude Code can replace a team of devs. It resolves many issues, such as the tension between devs wanting to focus and devs wanting their peers to put aside their task to review their pull requests. Claude generates the code and the human reviews it. There's no delay in the back-and-forth unlike in a team of humans. There's no ego and there's no context switching fatigue. Given that code reviewing is a bottleneck, it's feasible that one person can do it by themselves. And Claude can certainly generate working code at least 10x faster than any dev.
You’re talking from idealistic requirements - input - programming - output point. That’s not how the world operates. Egos are “important”, politics, bureaucracy, all of those are essential parts of the organizations. LLMs don’t change that, and without changing that there’s no chance at all. Previously coding was maybe 0.1 bottleneck, now it’s 0.07 bottleneck.
These are just made up numbers. In our team, PR review is always 1 minute -- we never review, just approve, and let production reviews. /s
> I know what you're thinking. Come on, 10x? That’s a lot. It’s unfathomable. Surely we’re exaggerating.
See this rarely known trick! You can be up to 9x more efficient if you code something else when you wait for review
> AI
projectile vomits
Fuck engineering, let's work on methods to make artificial retard be more efficient!
> See this rarely known trick! You can be up to 9x more efficient if you code something else when you wait for review
Context switch alone would kill any productivity gains from this. And I’m not even touching on conflicting MRs and interdependencies yet.
from article:
1. Whoa, I produced this prototype so fast! I have super powers!
2. This prototype is getting buggy. I’ll tell the AI to fix the bugs.
3. Hmm, every change now causes as many new bugs as it fixes.
4. Aha! But if I have an AI agent also review the code, it can find its own bugs!
5. Wait, why am I personally passing data back and forth between agents
6. I need an agent framework
7. I can have my agent write an agent framework!
8. Return to step 1
the author seems to imply this is recursive when it isn't. when you have an effective agent framework you can ship more high quality code quickly.
I've been begging left and right, and I've yet to see a single example of this agent-written high-quality quickly-shipped code.
There are examples littered around threads on HN. What happens is when people provide the examples, the goalposts get moved. So people have stopped bothering to reply to these demands.
OpenClaw! You just need to slightly change the definition of “good code”. The point of code is to ultimately bring money. The guy got hired by OpenAI and who gives a shit what happens to the “project” next. Mission accomplished.
I'm guessing a lot of the high-x productivity boost is from a cycle of generating lots of code, having bug reports detected or hallucinated from that code, and then generating even more code to close out those reports, and so on
This is one of the reasons I'm so interested in sandboxing. A great way to reduce the need for review is to have ways of running code that limit the blast radius if the code is bad. Running code in a sandbox can mean that the worst that can happen is a bad output as opposed to a memory leak, security hole or worse.
Isn’t “bad output” already worst case? Pre-LLMs correct output was table stakes.
You expect your calculator to always give correct answers, your bank to always transfer your money correctly, and so on.
> Isn’t “bad output” already worst case?
Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"
> Pre-LLMs correct output was table stakes
We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults. Correctness isn't even on the table, outside of a few (mostly academic) contexts
> Worst case in a modern agentic scenario is more like "drained your bank account to buy bitcoin and then deleted your harddrive along with the private key"
Hence my interest in sandboxes!
> drained your bank account to buy bitcoin and then deleted your harddrive
These are what I meant by correct output. The software does what you expect it to.
> We're only just getting to the point where we have languages and tooling that can reliably prevent segfaults
This is not really an output issue IMO. This is a failing edge case.
LLMs are moving the industry away from trying to write software that handles all possible edge cases gracefully and towards software developed very quickly that behaves correctly on the happy paths more often than not.
I've seen plenty of decision makers act on bad output from human employees in the past. The company usually survives.
And if the bad output leads to a decision maker making a bad decision, that takes down your company or kills your relative ?
The sandbox in question was to absorb shrapnel from explosions, clearly
If you save 3 hours building something with agentic engineering and that PR sits in review for the same 30 hours or whatever it would have spent sitting in review if you handwrote it, you’re still saving 3 hours building that thing.
So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput (good lord, we better get used to doing more code review)
This doesn’t work if you spend 3 minutes prompting and 27 minutes cleaning up code that would have taken 30 minutes to write anyway, as the article details, but that’s a different failure case imo
> So in that extra time, you can now stack more PRs that still have a 30 hour review time and have more overall throughput
Hang on, you think that a queue that drains at a rate of $X/hour can be filled at a rate of 10x$X/hour?
No, it cannot: it doesn't matter how fast you fill a queue if the queue has a constant drain rate, sooner or later you are going to hit the bounds of the queue or the items taken off the queue are too stale to matter.
In this case, filling a queue at a rate of 20 items per hour (every 3 minutes) while it drains at a rate of 1 item every 5 hours means that after a single day, you can expect your last PR to be reviewed in ((8x20) - 1) hours.
IOW, after a single day the time-to-review is 159 hours. Your PRs after the second day is going to take +300 hours.
This is the fundamental issue currently in my situation with AI code generation.
There are some strategies that help: a lot of the AI directives need to go towards making the code actually easy to review. A lot of it it sits around clarity, granularity (code should be committed primarily in reviewable chunks - units of work that make sense for review) rather than whatever you would have done previously when code production was the bottleneck. Similarly, AI use needs to be weighted not just more towards tests, but towards tests that concretely and clearly answer questions that come up in review (what happens on this boundary condition? or if that variable is null? etc). Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions. That is, if a change is evidently risk free (in the sense of, "even if this IS broken it doesn't matter) it should be able to be rapidly approved / merged. Only things where it actually matters if it wrong should be blocked.
I have a feeling there are whole areas of software engineering where best practices are just operating on inertia and need to be reformulated now that the underlying cost dynamics have fundamentally shifted.
>Finally, changes need to be stratified along lines of risk rather than code modularity or other dimensions.
Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?
Lemme guess, you cargo culted some "best practices" to offload risk awareness, so now your code is organized in "too big to fail" style and matches your vendor's risk profile instead of yours.
> Why don't those other dimensions, and especially the code modularity, already reflect the lines of business risk?
I guess the answer (if you're really asking seriously) is that previously when code production cost so far outweighed everything else, it made sense to structure everything to optimise efficiency in that dimension.
So if a change was implemented, the developer would deliver it as a functional unit that might cut across several lines of risk (low risk changes like updating some CSS sitting along side higher risk like a database migration, all bundled together). Because this was what made it fastest for the developer to implement the code.
Now if AI is doing it, screw how easy or fast it is to make the change. Deliver it in review chunks.
Was the original method cargo culted? I think most of what we do is cargo culted regardless. Virtually the entire software industry is built that way. So probably.
> when code production cost so far outweighed everything else, it made sense to structure everything to optimise efficiency in that dimension
Oh, for sure. Those people making electro-mechanical computers at the end of the 19th century certainly did that a lot.
You are considering a good-faith environment where GP cares about throughput of the queue.
I think GP is thinking in terms of being incentivized by their environment to demonstrate an image of high personal throughput.
In a dysfunctional organization one is forced to overpromise and underdeliver, which the AI facilitates.
If your team's bottleneck is code review by senior engineers, adding more low quality PRs to the review backlog will not improve your productivity. It'll just overwhelm and annoy everyone who's gotta read that stuff.
Generally if your job is acting as an expensive frontend for senior engineers to interact with claude code, well, speaking as a senior engineer I'd rather just use claude code directly.
Linting, compiler warnings and automated tests have helped a lot with the grunt work of code review in the past.
We can use AI these days to add another layer.
Except that when you have 10 PRs out, it takes longer for people to get to them, so you end up backlogged.
And when the PR you never even read because the AI wrote it gets bounced back you with an obscure question 13 days later ..... you're not going to be well positioned to respond to that.