This is self-reported productivity, in that devs are saying AI saves them about 4 hours per week. But let’s not forget the METR study that found a 20% increase in self-reported productivity but a 19% decrease in actual measured productivity.
(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)
Let's also not forget the multiple other studies that found significant boosts to productivity using rigorous methods like RCTs.
However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254
That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.
This might still be true but we desperately need new data.
None of those changes address the issue jdlshore is pointing out: self assessed developers productivity increases from LLMs are not a reliable indication of actual productivity increases. It's true that modern LLMs might have less of a negative impact on productivity or increase it, but you won't be able to tell by asking developers if they feel more productive.
(Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).
Not a scientific study, but someone did replicate the experiment on themselves [0] and found that in their case, any effect from LLM use wasn't detectable in their sample. Notably they almost certainly had more experience with LLMs than most of the METR participants did.
You're only as fast as your biggest bottleneck. Adding AI to an existing organization is just going to show you where your bottlenecks are, it's not going to magically make them go away. For most companies, the speed of writing code probably wasn't the bottleneck in the first place.
Agreed. The bottleneck is QA/Code review and that is never going away from most corps. I've never worked at a job in tech that didn't require code review and no, asking a code agent to review a PR is never going to be "good enough".
And here we are, the central argument for why code agents are not these job killing hype beasts that are so regularly claimed.
Has anyone seen what multi-agent code workflows produce? Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
My head of engineering spent half a day creating a complex setup of agents in opencode, to refactor a data model across multiple repositories. After a day running agents and switching between providers to work around the token limits, it dumped a -20k +30k change set we'll need to review.
If we're very lucky, we'll break even time wise compared to just running a single agent on a tight leash.
> I've never worked at a job in tech that didn't require code review
I have. Sometimes the resulting code was much worse than what you get from an LLM, and yet the project itself was still a success despite this.
I've also worked in places with code review, where the project's own code quality architecture-and-process caused it to be so late to the market it was an automatic failure.
What matters to a business is ideally identical to the business metrics, which are usually not (but sometimes are) the code metrics.
The bottleneck at larger orgs is mostly always decision-making.
Getting code written and reviewed is the trivial part of the job in most cases, discovering the product needs, considering/uncovering edge-cases, defining business logic that is extensible or easily modifiable when conditions change, etc. are the parts that consume 80% of my time.
We in the engineering org at the company I work for have raised this flag many times during adoption of AI-assisting tools, now that the rollout is deeply in progress with most developers using the tools, changing workflows, it has become the sore thumb sticking out: yes, we can deliver more code if it's needed but for what exactly do you need it?
So far I haven't seen a speed up in decision-making, the same chain of approvals, prioritisation, definitions chugs along as it was and it is clearly the bottleneck.
the bottleneck is aligning people on what the right thing to do is, and fiting the change into everyone's mental models. it gets worse the more people are involved
one thing that aways slowed me down was writing jsdocs and testing.
Now i can write one example of a pass and then get codex to read the code and write a test for all the branches in that section saves time as it can type a lot faster than i can and its mostly copying the example i already have but changing the input to hit all the branches.
Outsourcing testing to AI makes perfect sense if you assume that tests exist out of an obligation to meet some code coverage requirements, rather than to ensure correctness. Often I'll write a module and a few tests that cover its functionality, only for CI to complain that line coverage has decreased and reject my merge! AI to the rescue! A perfect job for a bullshit generator.
outsourcing testing the AI also gets its code to be connected to deterministic results, and show let the agent interact with the code to speculate expectations and check them against the actual code.
it could still speculate wrong things, but it wont speculate that the code is supposed to crash on the first line of code
> Testing is the one thing you would never outsource to AI
That's not really true.
Making the AI write the code, the test, and the review of itself within the same session is YOLO.
There's a ton of scaffolding in testing that can be easily automated.
When I ask the AI to test, I typically provide a lot of equivalence classes.
And the AI still surprises me with finding more.
On the other hand, it's equally excellent at saying "it tested", and when you look at the tests, they can be extremely shallow. Or they can be fairly many unit tests of certain parts of the code, but when you run the whole program, it just breaks.
The most valuable testing when programming with AI (generated by AI, or otherwise) are near-realistic integration tests. That's true for human programmers, but we take for granted that casual use of the program we make as we develop it constitutes as a poor man's test. When people who generally don't write tests start using AI, there's just nothing but fingers crossed.
I'd rather say: If there's one thing you would never outsource to AI, it's final QA.
> (Testing is the one thing you would never outsource to AI.)
I would rephrase that as "all LLMs, no matter how many you use, are only as good as one single pair of eyes".
If you're a one-person team and have no capital to spend on a proper test team, set the AI at it. If you're a megacorp with 10k full time QA testers, the AI probably isn't going to catch anything novel that the rest of them didn't, but it's cheap enough you can have it work through everything to make sure you have, actually, worked through everything.
You don't use the LLM to check your code for correctness; you use the LLM to generate tests to exercise code paths, and verify that they do exercise those code paths.
Apparently "AI is speeding up the onboarding process", they say. But isn't that because the onboarding process is about learning, and by having an AI regurgitate the answers you can complete the process without learning anything, which might speed it up but completely defeats the purpose?
According to the article, onboarding speed is measured as “time to the 10th Pull Request (PR).”
As we have seen on public GitHub projects, LLMs have made it really easy to submit a large number of low-effort pull requests without having any understanding of a project.
Obviously, such a kind of higher onboarding speed is not necessarily good for an organization.
I think there's definite scope for that being true; not because you can start doing stuff before you understand it (you can), but because you can ask questions of a codebase your unfamiliar with to learn about it faster.
id guess the time til forst being able to make useful changes has dropped to near zero, but the time to get mastery of the code base has gone towards infinity.
is that mastery still useful as time goes on though? its always felt a bit like its unhealthy for code to have people with mastery on it. its a sign of a bad bus factor. every effort ive ever seen around code quality and documentation improvement has been to make that code mastery and full understanding irrelevant.
This has been my experience as a dev, and it always confuses me when people say they prefer to work at a “higher level”. The minutiae are often just as important as some of the higher level decisions. Not everything, but not an insignificant portion either. This applies to basic things like correctness, performance, and security - craft, style, and taste are not involved.
I think that over time people will start looking at AI-assisted coding the same way we now look at loosely typed code, or at (heavy) frameworks: it saves time in the short term, but may cause significant problems down the line. Whether or not this tradeoff makes sense in a specific situation is a matter of debate, and there's usually no obviously right or wrong answer.
Once the free money runs out, the AI cos may shift to making heavily verified code snippets with more direct language control. This will heavily simplify a lot of boilerplate instead of fairytales of some AGI coding wiz.
Isn't the boilerplate that "AI" is capable of generating becoming more and more dated with each passing day?
Are the AI firms capable of retraining their models to understand new features in the technologies we work with? Or are LLMs going to be stuck generating C.A. 2022 boilerplate forever?
FE has a lot of boilerplate only if you’re starting from scratch every single time. That’s why we had template systems and why we invented view libraries. Once you’ve defined your libraries, you just copy-paste stuff.
It seems like they should be able to “overweight” newer training data. But the risk is the newer training data is going to skew more towards AI slop than older training data.
It really depends on the situation. I think there's an argument for generating in a lower level strongly typed language, where most of the work of writing the pointlessly verbose parts is eliminated, any errors are found by the compiler immediately, but it still leaves the option for handwritten optimizations when needed. Sort of how one can drop down to C in python for the parts that need more performance.
Unsurprising for multiple reasons. Most organizations have other bottlenecks and limiting factors than “how fast can you develop”.
Regardless, if you’re a dev who is now 2x as productive in terms of work completed per day, and quality remains stable, why should this translate to 2x the output? Most people are paid by the hour and not for outcomes.
And yes, I am suggesting that if you complete in 4 hours that which took you 8 hours in 2019, that you should consider calling it a day.
The real takeaway here -- also corroborated by the DORA 2025 report https://dora.dev/research/2025/ -- is that more than anything, AI amplifies your current development culture. Organizations with strong quality control discipline enjoy more velocity, those with weak practices suffer more outages.
Expecting AI to magically overcome your development culture is like expecting consultants to magically fix your business culture.
Furthermore, by various estimates, engineers only spend 10 - 60% of their time on actual code. So, given that currently AI is largely used only for coding activities, 10% is actually considerable savings.
Also this is the result of retro-fitting AI into existing workflows; actual "AI-native" workflows would probably look very different, likely having refactored in other parts of software engineering. Spotify's "Honk" workflow is probably just a starting point.
I'm pretty sure it has to do with the individual as well as the culture. Juniors/new hire use AI to multiply by two their wrong/unsafe output, and seniors then have to spend more time correcting it.
I'll be honest: I piss poor code, each time I come back to an old project I see where I could have done better. New hires are worse, but before AI (and especially Opus) they didn't produce that much code before spending like 6 months learning (I'm on a netsec tooling team). Now, they start producing code after two weeks or less, and every line have to be checked because they don't understand what they are doing.
I think my personal output was increased by 15% on average (maybe 5 on difficult projects), but our team output decreased overall.
Yes, we as a society urgently have to figure out how to learn and educate with AI. There are even studies showing that students who use AI to do their work do not learn the necessary skills.
And I'm also hearing grumblings about entry level talent that is absolutely clueless without AI, which does not help the junior hiring scene at all.
At this point it seems clear that people wishing to learn a discipline should restrict their usage of AI until they have "built the muscles", but none of our educational, testing, recruitment and upskilling practices are conducive to that.
Yeah, the title may suggest that productivity is still 10% out of 100% after CEOs fired half of developers believing that the rest will do all the job with the help of AI.
I think some AI companies are just now starting to feel the pressure to profit.
Soon, I predict we will see a pretty significant jump in price that will make a 10% productivity gain seem tiny compared to the associated bills.
For now, these companies are trying to reach critical mass so their users are so dependant on their tech that they have to keep paying at least in the short term.
My biggest road blocks as an engineer has almost never been the authorship of code but everything else around it.
* Getting code reviewed
* Making sure its actually solving the problem
* Communicating to the rest of the team whats happening
* Getting tests to pass
* Getting it deployed
* Verifying that the fix is implemented in production
* Starting it all over when there is a misunderstanding
Slinging more code faster is great and getting unit testing more-or-less for free is awesome but the separation between a good and great engineer is one of communication and management.
AI is causing us to regress to thinking that code velocity is a good metric to use when comparing engineers.
I read this article as the CTO being the bottleneck if he's only seeing 10% productivity boost at his organization.
I dont think this is a purely AI problem more with the legacy costs of maintaining many minds that can't be solved by just giving people AI tools until the AI comes for the CTO role (but not CEO or revenue generating roles) too and whichever manager is bottlenecking.
I imagine a future where we have Nasdaq listed companies run by just a dozen people with AI agents running and talking to each other so fast that text becomes a bottleneck and they need another medium that can only be understood by an AI that will hold humans hand
This shift would also be reflected by new hardware shifts...perhaps photonic chips or anything that lets AI scale up crazy without the energy cost....
Exciting times are ahead AI but it's also accelerating digital UBI....could be good and bad.
This is exactly right. And assuming organizations use the gains to cut headcount rather than boost total productivity, a 10% reduction in white collar employment would still be an era-defining systemic shock to the economy.
I can see where productivity could be higher if all I did was type in programs to some spec, or bootstrapping new apps all day - but that's like not the reality of "programming", at least for me past 25 years. Sorting through what to even make and interpreting "requirements" is what takes the most time
This will lead to natural selection. As AI becomes increasingly integrated into all areas, companies that manage it less effectively than others will face greater selection pressure.
Blunt opinion: Most devs are not that good and really only execute what they are told to do.
The threat of AI for devs, and the way to drastically improve productivity is there: keep the better devs who can think systemically, who can design solutions, who can solve issues themselves and give them all the AI help available, cut the rest.
That’s how I feel too. When I was an architect at a ~300-person company, a big chunk of my job shifted to reviews, technical design docs, and guidance. I’m getting great results by feeding context like that into Claude Code, then reviewing and steering what it produces.
It really does feel like a multiplier on me and I understand things enough to get my hands dirty where Claude struggles.
Lately I’ve been wondering if that role evolves into a more hierarchical review system: senior engineers own independent modules end-to-end, and architects focus on integration, interfaces, and overall coherence. Honestly, the best parts of our product already worked like that even before AI.
This is self-reported productivity, in that devs are saying AI saves them about 4 hours per week. But let’s not forget the METR study that found a 20% increase in self-reported productivity but a 19% decrease in actual measured productivity.
(It used a clever and rigorous technique for measuring productivity differences, BTW, for anyone as skeptical of productivity measures as I am.)
Let's also not forget the multiple other studies that found significant boosts to productivity using rigorous methods like RCTs.
However, because these threads always go the same way whenever I post this, I'll link to a previous thread in hopes of preempting the same comments and advancing the discussion! https://news.ycombinator.com/item?id=46559254
Also, DX (whose CTO was giving the presentation) actually collects telemetry-based metrics (PR's etc.) as well: https://getdx.com/uploads/ai-measurement-framework.pdf
It's not clear from TFA if these savings are self-reported or from DX metrics.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
That info is from mid 2025, talking about models released in Oct 2024 and Feb 2025. It predates tools like Claude Code and Codex, Lovable was 1/3 current ARR, etc.
This might still be true but we desperately need new data.
None of those changes address the issue jdlshore is pointing out: self assessed developers productivity increases from LLMs are not a reliable indication of actual productivity increases. It's true that modern LLMs might have less of a negative impact on productivity or increase it, but you won't be able to tell by asking developers if they feel more productive.
(Also, Anthropic released Claude Code in Febuary of 2025, which was near the start of the period the study ran).
Yeah new data would be great, but i feel like these tools are not substantively better and this is becoming the new "its different this time!"
Has the METR study been replicated?
Not a scientific study, but someone did replicate the experiment on themselves [0] and found that in their case, any effect from LLM use wasn't detectable in their sample. Notably they almost certainly had more experience with LLMs than most of the METR participants did.
[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...
I haven’t heard about any similar studies, no. I’m planning to conduct one at my workplace but we’re still deciding exactly which uses of AI to test.
You're only as fast as your biggest bottleneck. Adding AI to an existing organization is just going to show you where your bottlenecks are, it's not going to magically make them go away. For most companies, the speed of writing code probably wasn't the bottleneck in the first place.
the amount of people that work in technology and have never heard of amdahl's law always shocks me
https://en.wikipedia.org/wiki/Amdahl's_law
a 100% increase in coding speed means I then I get to spend an extra 30 minutes a week in meetings
while now hating my job, because the only fun bit has been removed
"progress"
So if I'm understanding you correctly, prior to AI tools you spent 1 hour per week coding? And now you spend 30 minutes per week?
the number of people who have heard of Amdahl's law but don't know when to use "amount of X" vs "number of Y" always shocks me as well
Agreed. The bottleneck is QA/Code review and that is never going away from most corps. I've never worked at a job in tech that didn't require code review and no, asking a code agent to review a PR is never going to be "good enough".
And here we are, the central argument for why code agents are not these job killing hype beasts that are so regularly claimed.
Has anyone seen what multi-agent code workflows produce? Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
My head of engineering spent half a day creating a complex setup of agents in opencode, to refactor a data model across multiple repositories. After a day running agents and switching between providers to work around the token limits, it dumped a -20k +30k change set we'll need to review.
If we're very lucky, we'll break even time wise compared to just running a single agent on a tight leash.
> I've never worked at a job in tech that didn't require code review
I have. Sometimes the resulting code was much worse than what you get from an LLM, and yet the project itself was still a success despite this.
I've also worked in places with code review, where the project's own code quality architecture-and-process caused it to be so late to the market it was an automatic failure.
What matters to a business is ideally identical to the business metrics, which are usually not (but sometimes are) the code metrics.
The bottleneck at larger orgs is mostly always decision-making.
Getting code written and reviewed is the trivial part of the job in most cases, discovering the product needs, considering/uncovering edge-cases, defining business logic that is extensible or easily modifiable when conditions change, etc. are the parts that consume 80% of my time.
We in the engineering org at the company I work for have raised this flag many times during adoption of AI-assisting tools, now that the rollout is deeply in progress with most developers using the tools, changing workflows, it has become the sore thumb sticking out: yes, we can deliver more code if it's needed but for what exactly do you need it?
So far I haven't seen a speed up in decision-making, the same chain of approvals, prioritisation, definitions chugs along as it was and it is clearly the bottleneck.
i dont think thats actually the bottleneck?
the bottleneck is aligning people on what the right thing to do is, and fiting the change into everyone's mental models. it gets worse the more people are involved
> Take a look at openclaw, the code base is an absolute disaster. 500k LoC for something that can be accomplished in 10k.
Mission accomplished: acquhire worth probably millions and millions.
I agree with you, by the way.
It was a hire not an acquihire. There was no acquisition.
There was a big payoff on signing so to-may-to, to-mah-to.
I'm sorry but consider how many more edge cases and alternatives can be handled in 500k LoC as compared to that tiny 10k.
In the days of AGI, higher LoC is better. It just means the code is more robust, more adaptable, better suited to real world conditions.
That’s… not how software works, no matter how it is produced. Complexity is the enemy; always.
In high-performance teams it is. In bike-shedding environments of course it is not.
This. The key bottleneck in many organizations is the "socialize and align" on what to build. Or just "socialize and align" in general. :)
one thing that aways slowed me down was writing jsdocs and testing.
Now i can write one example of a pass and then get codex to read the code and write a test for all the branches in that section saves time as it can type a lot faster than i can and its mostly copying the example i already have but changing the input to hit all the branches.
> let's have LLMs check our code for correctness
Lmao. Rofl even.
(Testing is the one thing you would never outsource to AI.)
Outsourcing testing to AI makes perfect sense if you assume that tests exist out of an obligation to meet some code coverage requirements, rather than to ensure correctness. Often I'll write a module and a few tests that cover its functionality, only for CI to complain that line coverage has decreased and reject my merge! AI to the rescue! A perfect job for a bullshit generator.
outsourcing testing the AI also gets its code to be connected to deterministic results, and show let the agent interact with the code to speculate expectations and check them against the actual code.
it could still speculate wrong things, but it wont speculate that the code is supposed to crash on the first line of code
> Testing is the one thing you would never outsource to AI
That's not really true.
Making the AI write the code, the test, and the review of itself within the same session is YOLO.
There's a ton of scaffolding in testing that can be easily automated.
When I ask the AI to test, I typically provide a lot of equivalence classes.
And the AI still surprises me with finding more.
On the other hand, it's equally excellent at saying "it tested", and when you look at the tests, they can be extremely shallow. Or they can be fairly many unit tests of certain parts of the code, but when you run the whole program, it just breaks.
The most valuable testing when programming with AI (generated by AI, or otherwise) are near-realistic integration tests. That's true for human programmers, but we take for granted that casual use of the program we make as we develop it constitutes as a poor man's test. When people who generally don't write tests start using AI, there's just nothing but fingers crossed.
I'd rather say: If there's one thing you would never outsource to AI, it's final QA.
> (Testing is the one thing you would never outsource to AI.)
I would rephrase that as "all LLMs, no matter how many you use, are only as good as one single pair of eyes".
If you're a one-person team and have no capital to spend on a proper test team, set the AI at it. If you're a megacorp with 10k full time QA testers, the AI probably isn't going to catch anything novel that the rest of them didn't, but it's cheap enough you can have it work through everything to make sure you have, actually, worked through everything.
You don't use the LLM to check your code for correctness; you use the LLM to generate tests to exercise code paths, and verify that they do exercise those code paths.
And that test will check the code paths are run.
That doesn't tell you that the code is correct. It tells you that the branching code can reach all the branches. That isn't very useful.
Apparently "AI is speeding up the onboarding process", they say. But isn't that because the onboarding process is about learning, and by having an AI regurgitate the answers you can complete the process without learning anything, which might speed it up but completely defeats the purpose?
Yes, that's how I'd interpret it, too.
According to the article, onboarding speed is measured as “time to the 10th Pull Request (PR).”
As we have seen on public GitHub projects, LLMs have made it really easy to submit a large number of low-effort pull requests without having any understanding of a project.
Obviously, such a kind of higher onboarding speed is not necessarily good for an organization.
I think there's definite scope for that being true; not because you can start doing stuff before you understand it (you can), but because you can ask questions of a codebase your unfamiliar with to learn about it faster.
id guess the time til forst being able to make useful changes has dropped to near zero, but the time to get mastery of the code base has gone towards infinity.
is that mastery still useful as time goes on though? its always felt a bit like its unhealthy for code to have people with mastery on it. its a sign of a bad bus factor. every effort ive ever seen around code quality and documentation improvement has been to make that code mastery and full understanding irrelevant.
Correct. Reading code is important. The details are in the minutia, and the way code works is that the minutia are important.
Summarizing this with AI makes you lose that context.
This has been my experience as a dev, and it always confuses me when people say they prefer to work at a “higher level”. The minutiae are often just as important as some of the higher level decisions. Not everything, but not an insignificant portion either. This applies to basic things like correctness, performance, and security - craft, style, and taste are not involved.
> This has been my experience as a dev, and it always confuses me when people say they prefer to work at a “higher level”.
> The minutiae are often just as important as some of the higher level decisions.
Frankly, a failure to understand this is a tell that someone is not equipped to evaluate code quality.
I think that over time people will start looking at AI-assisted coding the same way we now look at loosely typed code, or at (heavy) frameworks: it saves time in the short term, but may cause significant problems down the line. Whether or not this tradeoff makes sense in a specific situation is a matter of debate, and there's usually no obviously right or wrong answer.
Once the free money runs out, the AI cos may shift to making heavily verified code snippets with more direct language control. This will heavily simplify a lot of boilerplate instead of fairytales of some AGI coding wiz.
Isn't the boilerplate that "AI" is capable of generating becoming more and more dated with each passing day?
Are the AI firms capable of retraining their models to understand new features in the technologies we work with? Or are LLMs going to be stuck generating C.A. 2022 boilerplate forever?
No to the first question, and maybe with a lot of money for the second question.
In the 20 years I've been in the industry, boiler plate has dropped dramatically in the backend.
Right now, front end has tons of boiler plate. It's one of the reasons AI hassle such a wow factor for FE, trivial tasks require a lot of code.
But even that is much better than it was 10 years ago.
That was a long way of saying I disagree with your no.
FE has a lot of boilerplate only if you’re starting from scratch every single time. That’s why we had template systems and why we invented view libraries. Once you’ve defined your libraries, you just copy-paste stuff.
It seems like they should be able to “overweight” newer training data. But the risk is the newer training data is going to skew more towards AI slop than older training data.
There won't ever be newer training data.
The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.
Every time claude code runs tests or builds after a change, it's collecting training data.
Has Anthropic been able to leverage this training data successfully?
I can't pretend to know how things work internally, but I would expect it to be involved in model updates.
You need human language programming-related questions to train on too, not just the code.
thats what the related chats are for?
It really depends on the situation. I think there's an argument for generating in a lower level strongly typed language, where most of the work of writing the pointlessly verbose parts is eliminated, any errors are found by the compiler immediately, but it still leaves the option for handwritten optimizations when needed. Sort of how one can drop down to C in python for the parts that need more performance.
Unsurprising for multiple reasons. Most organizations have other bottlenecks and limiting factors than “how fast can you develop”.
Regardless, if you’re a dev who is now 2x as productive in terms of work completed per day, and quality remains stable, why should this translate to 2x the output? Most people are paid by the hour and not for outcomes.
And yes, I am suggesting that if you complete in 4 hours that which took you 8 hours in 2019, that you should consider calling it a day.
The real takeaway here -- also corroborated by the DORA 2025 report https://dora.dev/research/2025/ -- is that more than anything, AI amplifies your current development culture. Organizations with strong quality control discipline enjoy more velocity, those with weak practices suffer more outages.
Expecting AI to magically overcome your development culture is like expecting consultants to magically fix your business culture.
Furthermore, by various estimates, engineers only spend 10 - 60% of their time on actual code. So, given that currently AI is largely used only for coding activities, 10% is actually considerable savings.
Also this is the result of retro-fitting AI into existing workflows; actual "AI-native" workflows would probably look very different, likely having refactored in other parts of software engineering. Spotify's "Honk" workflow is probably just a starting point.
I'm pretty sure it has to do with the individual as well as the culture. Juniors/new hire use AI to multiply by two their wrong/unsafe output, and seniors then have to spend more time correcting it.
I'll be honest: I piss poor code, each time I come back to an old project I see where I could have done better. New hires are worse, but before AI (and especially Opus) they didn't produce that much code before spending like 6 months learning (I'm on a netsec tooling team). Now, they start producing code after two weeks or less, and every line have to be checked because they don't understand what they are doing.
I think my personal output was increased by 15% on average (maybe 5 on difficult projects), but our team output decreased overall.
Yes, we as a society urgently have to figure out how to learn and educate with AI. There are even studies showing that students who use AI to do their work do not learn the necessary skills.
And I'm also hearing grumblings about entry level talent that is absolutely clueless without AI, which does not help the junior hiring scene at all.
At this point it seems clear that people wishing to learn a discipline should restrict their usage of AI until they have "built the muscles", but none of our educational, testing, recruitment and upskilling practices are conducive to that.
I found the title for this post misleading. To clarify it a bit, AI has only improved productivity by 10% even though 93% of devs are using it.
Yeah, the title may suggest that productivity is still 10% out of 100% after CEOs fired half of developers believing that the rest will do all the job with the help of AI.
I think some AI companies are just now starting to feel the pressure to profit.
Soon, I predict we will see a pretty significant jump in price that will make a 10% productivity gain seem tiny compared to the associated bills.
For now, these companies are trying to reach critical mass so their users are so dependant on their tech that they have to keep paying at least in the short term.
My biggest road blocks as an engineer has almost never been the authorship of code but everything else around it.
* Getting code reviewed
* Making sure its actually solving the problem
* Communicating to the rest of the team whats happening
* Getting tests to pass
* Getting it deployed
* Verifying that the fix is implemented in production
* Starting it all over when there is a misunderstanding
Slinging more code faster is great and getting unit testing more-or-less for free is awesome but the separation between a good and great engineer is one of communication and management.
AI is causing us to regress to thinking that code velocity is a good metric to use when comparing engineers.
As far as I can tell from my workplace the total impact on productivity is neutral to negative.
I read this article as the CTO being the bottleneck if he's only seeing 10% productivity boost at his organization.
I dont think this is a purely AI problem more with the legacy costs of maintaining many minds that can't be solved by just giving people AI tools until the AI comes for the CTO role (but not CEO or revenue generating roles) too and whichever manager is bottlenecking.
I imagine a future where we have Nasdaq listed companies run by just a dozen people with AI agents running and talking to each other so fast that text becomes a bottleneck and they need another medium that can only be understood by an AI that will hold humans hand
This shift would also be reflected by new hardware shifts...perhaps photonic chips or anything that lets AI scale up crazy without the energy cost....
Exciting times are ahead AI but it's also accelerating digital UBI....could be good and bad.
> it's also accelerating digital UBI
Do you have sources for this claim?
A 10% uplift in productivity for the cost of probably 0.001% of the salary budget is an incredible success.
This is exactly right. And assuming organizations use the gains to cut headcount rather than boost total productivity, a 10% reduction in white collar employment would still be an era-defining systemic shock to the economy.
Productivity improvements from automation actually result in an increase in jobs, not fewer jobs. Basic economics.
How are CTO's so out of touch and yet loud and proud about it.
The title is misleading. Productivity isn't at 10%, it's at 110%.
I can see where productivity could be higher if all I did was type in programs to some spec, or bootstrapping new apps all day - but that's like not the reality of "programming", at least for me past 25 years. Sorting through what to even make and interpreting "requirements" is what takes the most time
AI adoption has reduced productivity at my workplace, and by a noticeable amount!
This will lead to natural selection. As AI becomes increasingly integrated into all areas, companies that manage it less effectively than others will face greater selection pressure.
Or, AI will turn out to just not be that useful.
It's such a weird effect.
At a personal level, AI has made non-trivial improvements to my life. I can clearly see the value in there.
At an organizational level, it tends to get in the way much more than helping out. I do not yet see the value in there.
That's expected for any new "low-code" solution du jour.
Blunt opinion: Most devs are not that good and really only execute what they are told to do.
The threat of AI for devs, and the way to drastically improve productivity is there: keep the better devs who can think systemically, who can design solutions, who can solve issues themselves and give them all the AI help available, cut the rest.
That’s how I feel too. When I was an architect at a ~300-person company, a big chunk of my job shifted to reviews, technical design docs, and guidance. I’m getting great results by feeding context like that into Claude Code, then reviewing and steering what it produces.
It really does feel like a multiplier on me and I understand things enough to get my hands dirty where Claude struggles.
Lately I’ve been wondering if that role evolves into a more hierarchical review system: senior engineers own independent modules end-to-end, and architects focus on integration, interfaces, and overall coherence. Honestly, the best parts of our product already worked like that even before AI.
Yeah, industry has told them that devs aren't valuable and AI can do their job. Who TF has motivation after that?
People getting paid >$400k TC
No motivation? I'm sorry buddy but your ass is getting replaced by Claude Code in the next 3-6 weeks.
[dead]