I run a small data operation - mostly background jobs processing - with a small layer of CRUD stuff on top could be Django | Rails.
I have found A.I to be good when given guard-rails e.g I can write the initial models & functions by hand - then A.I (smarter autocomplete) completes the rest.
Also good - after the initial setup - if I prompt it I no longer have to spend time reading some library docs I won't likely use
Bad - it can create bad code or go off the rails (hence you've to be a skilled operator i.e skilled engineer before)
Bad - Vibecoding doesn't work when you're dealign with external data whose APIs are mostly undocumented etc (writing types | dataclasses is useful for the smarter autocomplete)
Haven't seen this mentioned yet, but the worst part for me is that a lot of management LOVES to use Claude to generate 50 page design documents, PRDs, etc., and send them to us to "please review as soon as you can". Nobody reads it, not even the people making it. I'm watching some employees just generate endless slide decks of nonsense and then waffle when asked any specific questions. If any of that is read, it is by other peoples' Claude.
It has also enabled a few people to write code or plan out implementation details who haven't done so in a long (sometimes decade or more) time, and so I'm getting some bizarre suggestions.
Otherwise, it really does depend on what kind of code. I hand write prod code, and the only thing that AI can do is review it and point out bugs to me. But for other things, like a throwaway script to generate a bunch of data for load testing? Sure, why not.
I've been tasked with code reviews of Claude chat bot written code (not Claude code that has RAG and can browse the file system). It always lacks any understanding of our problem area, 75% of the time it only works for a specific scenario (the prompted case), and almost 100% of the time, when I comment about this, I'm told to take it over and make it work... and to use Claude.
I've kind of decided this is my last job, so when this company folds or fires me, I'm just going to retire to my cabin in the rural Louisiana woods, and my wife will be the breadwinner. I only have a few 10s of thousands left to make that home "free" (pay off the mortgage, add solar and batteries, plant more than just potatoes and tomatoes).
Though, post retirement, I will support my wife's therapy practice, and I have a goal of silly businesses that are just fun to do (until they arent), like my potato/tomato hybrid (actually just a graft) so you can make fries and ketchup from the same plant!
That sounds lovely. I think too many people get attached to the structure of life as they've lived it for the last n years and resist natural phase transitions for far too long. Good luck with retirement and your dream of being the botanical equivalent of the mean kid from Toy Story:p
I noticied what previously would take 30 mins, now takes a week. For example we had a performance issue with a DB, previously I'd just create a GSI (global secondary index), now there is a 37 page document with explanation, mitigation, planning, steps, reviews, risks, deployment plan, obstacles and a bunch of comments, but sure it looks cool and very professional.
Im now out of the workforce and can’t even imagine the complexity of the systems as management and everyone else communicate plans and executions through Claude. It must already be the case that some code based are massive behemoths few devs understand. Is Claude good enough to help maintain and help devs stay on top of the codebase?
I quit my last job because of this. I’m pretty sure manager was using free chatgpt with no regard for context length too, because not only was it verbose it was also close to gibberish. Being asked to review urgently and estimate deadlines got old real fast
Jump straight to the second option. You have to presume that the content they sent you has no relation whatsoever to their actual understanding of the matter.
We all use Claude at my work and I have a very strict rule for my boss and my team: we don’t say “I asked Claude”. We use it a lot, but I expect my team to own it.
I actually think there’s almost an acceptable workflow here of using LLMs as part of the medium of communication. I’m pretty much fine with someone sending me 500 lines of slop with the stated expectation that I’ll dump it into an LLM on my end and interact with it.
It’s the asymmetric expectations—that one person can spew slop but the other must go full-effort—that for me personally feels disrespectful.
I also don't mind that. Summarized information exchange feels very efficient. But for sure, it seems like a societal expectation is emerging around these tools right now - expect me to put as much effort into consuming data as you did producing it. If you shat out a bunch of data from an LLM, I'm going to use an LLM to consume that data as well. And it's not reasonable for you to expect me to manually parse that data, just as well as I wouldn't expect you to do the same.
However, since people are not going to readily reveal that they used an LLM to produce said output, it seems like the most logical way to do this is just always use an LLM to consume inputs, because there's no easy 100% way to tell whether it was created by an LLM or a human or not anymore.
This kinda risks the broken telephone problem, or when you translate from one language to another and then again to another - context and nuance is always lost.
Just give me the bullet points, it's more efficient anyway. No need to add tons of adjectives and purple prose around it to fluff it up.
Some day someone brilliant will discover the idea of "sharing prompts" to get around this issue. So, instead of sending the clean and summarized LLM output, you'll just send your prompt, and then the recipient can read that, and in response, share their prompt back to the original sender.
I think we'll eventually move away from using these verbose documents, presentations, etc for communication. Just do your work, thinking, solving problems, etc while verbally dumping it all out into LLM sessions as you go. When someone needs to be updated on a particular task or project, there will be a way to give them granular access to those sessions as a sort of partial "brain dump" of yours. They can ask the LLM questions directly, get bullet points, whatever form they prefer the information in.
That way, thinking is communication! That's kind of why I loved math so much - it felt like I could solve a problem and succinctly communicate with the reader at the same time.
If you write 3 bullet points and produce 500-pages of slop why would my AI summarise it back to the original 3 bullet points and not something else entirely?
It won't, and that's the joke. They will write three bullet points, but their AI will only focus on the first two and hallucinate two more to fill out the document. Your AI will ignore them completely and go off on some unrelated tangent based on the of the earlier hallucinations. Anthropic collects a fee from both of you and is the only real winner here.
> It’s the asymmetric expectations—that one person can spew slop but the other must go full-effort—that for me personally feels disrespectful.
This has always been the case. Have some junior shit out a few thousand lines of code, leave, and leave it for the senior cleanup crew to figure out what the fuck just happened...
If you shove content at me that I even suspect was AI generated I will summarily hit the delete button and probably ban you from sending me any form of communication ever again.
It's a breach of trust. I don't care if you're my friend, my boss, a stranger, or my dog - it crosses a line.
I value my time and my attention. I will willingly spend it on humans, but I most certainly won't spend it on your slop when you didn't even feel me worth making a human effort.
I've found in my (admittedly limited) use of LLMs that they're great for writing code if I don't forsee a need to review it myself either, but if I'm going to be editing the code myself later I need to be the one writing it. Also LLMs are bad at design.
For me it's throwaway scripts and tools. Or tools in general. But only simple tools that it can somewhat one-shot. If I ever need to tweak it, I one-shot another tool. If it works, it's fine. No need to know how it works.
If I'm feeling brave, I let it write functions with very clear and well defined input/output, like a well established algorithm. I know it can one-shot those, or they can be easily tested.
But when doing something that I know will be further developed, maintained, I mainly end up writing it by hand. I used to have the LLM write that kind of code as well, but I found it to be slower in the long run.
Definitely a lot of one-shot scripts for a given environment... I've started using a run/ directory for shell scripts that will do things like spin up a set of containers defined in a compose file.. build and test certain sub-projects, initialize a database, etc.
For the most part, many of them work the first time and just continue to do so to aid a project. I've done similar in terms of scaffolding a test/demo environment around a component that I'm directly focused on... sometimes similar for documentation site(s) for gh pages, etc.
I've found that SoTA LLMs sometimes implement / design differently (in the sense that "why didn't I think of that"), and that's always refreshing to see. I may run the same prompt through Gemini, Sonnet, and Codex just to see if they'd come up with some technique I didn't even know to consider.
> don't forsee a need to review it myself either
On the flip side, SoTA LLMs are crazy good at code review and bug fixes. I always use "find and fix business logic errors, edge cases, and api / language misuse" prompt after every substantial commit.
Obviously you should also use Claude to consume those 50 pages. It sounds cynical, but it's not. It's practical.
What I've learned in 2 years of heavy LLM use - ChatGPT, Gemini, and Claude, is that the significance is on expressing and then refining goals and plans. The details are noise. The clear goals matter, and the plans are derived from those.
I regularly interrupt my tools to say, "Please document what you just said in ...". And I manage the document organization.
At any point I can start fresh with any AI tool and say, "read x, y, and z documents, and then let's discuss our plans". Although I find that with Gemini, despite saying, "let's discuss", it wants to go build stuff. The stop button is there for a reason.
I use an agents.md file to guide Claude, and I include a prominent line that reads UPDATE THIS FILE WITH NEW LEARNINGS. This is a bit noisy -- I have to edit what is added -- but works well and it serves as ongoing instruction. And as you have pointed out, the document serves as a great base if/when I have to switch tools.
Similarly, managers at my workplace occasionally use LLMs to generate jira tickets (with nonsense implementation details), which has led junior engineers astray, leaving senior engineers to deal with the fallout.
Getting similar vibes from freelance clients sending me overly-articulated specs for projects, making it sound like they want sophisticated implementations. Then I ask about it and they actually want like a 30 row table written in a csv. Huge whiplash.
I instituted a simple “share the inputs” along with the outputs rule which prevents people doing exactly this. Your only value contribution is the input and filtering the output but for people with equal filtering skill, there’s no value in the output
The first point is so true. How do people expect me to work with their 20 page "deep research" document that's built by a crappy prompt and they didn't even bother to proofread.
The best thing to do is to schedule meetings with those people to go over the docs with them. Now you force them to eat their own shit and waste their own time the more output they create.
Love the intent, but isn't that wishful if you don't have any leverage? e.g., the higher up will trade you for someone who doesn't cause friction or you waste too much of your own time?
I've had this experience too. In the case of vibe code, there is at least some incentive from self-preservation that prevents things from getting too out of hand, because engineers know they will be on the hook if they allow Claude to break things. But the penalties for sloppy prose are much lower, so people put out slop tickets/designs/documentation, etc. more freely.
It makes my work suck, sadly. Team dynamics also contributes to that, admittedly.
Last year I was working on implementing a pretty big feature in our codebase, it required a lot of focus to get the business logic right and at the same time you had be very creative to make this feasible to run without hogging to much resources.
When I was nearly done and worked on catching bugs, team members grew tired of waiting and starting taking my code from x weeks ago (I have no idea why), feeding it to Claude or whatever and then came back with a solution. So instead of me finishing my code I had to go through their version of my code.
Each one of the proposals had one or more business requirements wrong and several huge bugs. Not one was any closer to a solution than mine was.
I had appreciated any contribution to my code, but thinking that it would be so easy to just take my code and finishing it by asking Claude was rather insulting.
We're in a phase where founders are obsessed with productivity so everything seens to work just fine and as intended with few slops.
They're racing to be as productive as possible so we can get who knows where.
There are times when I honestly don't even know why we're automating certain tasks anymore.
In the past, we had the option of saying we didn't know something, especially when it was an area we didn't want to know about. Today, we no longer have that option, because knowledge is just a prompt away. So you end up doing front-end work for a backend application you just built, even though your role was supposed to be completely different.
This feels similar to the slow encroachment of devops onto everything. We're making so much shit nowadays that there is nobody left but developers to shepherd things into production, with all the extra responsibility and none of the extra pay commensurate with being a sysadmin too.
That works when it's humans you can talk to. The same problem happens with AI agents though and "no, use the latest code" doesn't really help when multiple agents have each compacted their own version of what "latest" means.
I'm running Codex on a Raspberry Pi, and Claude Code CLI, Gemini CLI, and Claude in Chrome all on a Mac, all touching the same project across both machines. The drift is constant. One agent commits, the others don't know about it, and now you've got diverged realities. I'm not a coder so I can't just eyeball a diff and know which version is right.
Ended up building a mechanical state file that sits outside all the context windows. Every commit, every test run, every failed patch writes to it. When a new session starts, the agent reads that file first instead of trusting its own memory. Boring ops stuff really, but it's the only thing that actually stopped the "which version is real" problem.
It has made my job an awful slog, and my personal projects move faster.
At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up. It is painful and time consuming, the code base is a mess. In one case I had to merge a feature from one team into the main code base, but the feature was AI coded so it did not obey the API design of the main project. It also included a ton of stuff you don’t need in the first pass - a ton of error checking and hand-rolled parsing, etc, that I had to spend over a week unrolling so that I could trim it down and redesign it to work in the main codebase. It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly. AI tools are not good at this kind of design deconflicting task, so while it’s easy to get the initial concept out the gate almost instantly, you can’t just magically fit it into the bigger codebase without facing the technical debt you’ve generated.
In my personal projects, I get to experience a bit of the fun I think others are having. You can very quickly build out new features, explore new ideas, etc. You have to be thoughtful about the design because the codebase can get messy and hard to build on. Often I design the APIs and then have Claude critique them and implement them.
I think the future is bleak for people in my spot professionally – not junior, but also not leading the team. I think the middle will be hollowed out and replaced with principals who set direction, coordinate, and execute. A privileged few will be hired and developed to become leaders eventually (or strike gold with their own projects), but everyone in between is in trouble.
If you dont take a stand and refuse to clean their mess, aren't you part of the problem? No self respecting proponent of AI enabled development should suggest that the engineers generating the code are still not personally responsible for its quality.
Ultimately that's only an option if you can sustain the impact to your career (not getting promoted, or getting fired). My org (publicly traded, household name, <5k employees) is all-in on AI with the goal of having 100% of our code AI generated within the next year. We have all the same successes and failures as everyone else, there's nothing special about our case, but our technical leadership is fundamentally convinced that this is both viable and necessary, and will not be told otherwise.
People who disagree at all levels of seniority have been made to leave the organization.
Practically speaking, there's no sexy pitch you can make about doing quality grunt work. I've made that mistake virtually every time I've joined a company: I make performance improvements, I stabilize CI, I improve code readability, remove compiler warnings, you name it: but if you're not shipping features, if you're not driving the income needle, you have a much more difficult time framing your value to a non-engineering audience, who ultimately sign the paychecks.
Obviously this varies wildly by organization, but it's been true everywhere I've worked to varying degrees. Some companies (and bosses) are more self-aware than others, which can help for framing the conversation (and retaining one's sanity), but at the end of the day if I'm making a stand about how bad AI quality is, but my AI-using coworker has shipped six medium sized features, I'm not winning that argument.
It doesn't help that I think non-engineers view code quality as a technical boogeyman and an internal issue to their engineering divisions. Our technical leadership's attitude towards our incidents has been "just write better code," which... Well. I don't need to explain the ridiculousness of that statement in this forum, but it undermines most people's criticism of AI. Sure, it writes crap code and misses business requirements; but in the eyes of my product team? That's just dealing with engineers in general. It's not like they can tell the difference.
Hi thanks for this brilliant feature. It will really improve the product. However it needs a little bit more work before we can merge it into our main product.
1) The new feature does not follow the existing API guidelines found here: see examples an and b.
2) The new feature does not use our existing input validation and security checking code, see example.
Once the following points have been addressed we will be happy to integrate it.
All the best.
The ball is now in their court and the feature should come back better
This is a politics problem. Engineers were sending each other crap long before AI.
..so they copy/paste your message into Claude and send you back a +2000, -1500 version 3 minutes later. And now you get to go hunting for issues again.
There is an alternative way make the necessary point here.. Let it go through with comments to the effect that you can not attest to the quality or efficacy of the code and let the organization suffer the consequences of this foray into LLM usage. If they can't use these tools responsibly and are unwilling to listen to the people who can, then they deserve to hit the inevitable quality wall Where endless passes through the AI still can't deliver working software and their token budget goes through the ceiling attempting to make it work.
I am absolutely certain the world isn't just. I'm also absolutely certain the world can't get just unless you let people suffer consequences for their decisions. It's the only way people can world.
> ... I make performance improvements, I stabilize CI, I improve code readability, remove compiler warnings, you name it ...
These are exactly the kind of tasks that I ask an AI tool to perform.
Claude, Codex, et al are terrible at innovation. What they are good at is regurgitating patterns they've seen before, which often mean refactoring something into a more stable/common format. You can paste compiler warnings and errors into an agentic tool's input box and have it fix them for you, with a good chance for success.
I feel for your position within your org, but these tools are definitely shaking things up. Some tasks will be given over entirely to agentic tools.
> These are exactly the kind of tasks that I ask an AI tool to perform.
Very reasonable nowadays, but those were things I was doing back in 2018 as a junior engineer.
> Some tasks will be given over entirely to agentic tools.
Absolutely, and I've found tremendous value in using agents to clean up old techdebt with oneline prompts. They run off, make the changes, modify tests, then put up a PR. It's brilliant and has fully reshaped my approach... but in a lot of ways expectations on my efficiency are much worse now because leadership thinks I can rewrite our techstack to another language over a weekend. It almost doesn't matter that I can pass all this tidying off onto an LLM because I'm expected to have 3x the output that I did a year ago.
> My org [...] is all-in on AI with the goal of having 100% of our code AI generated within the next year.
> People who disagree at all levels of seniority have been made to leave the organization.
So either they're right (100% AI-generated code soon) and you'll be out of a job or they'll be wrong, but by then the smart people will have been gone for a while. Do you see a third future where next year you'll still have a job and the company will still have a future?
"100% AI-generated code soon" doesn't mean no humans, just that the code itself is generated by AI. Generating code is a relatively small part of software engineering. And if AI can do the whole job, then white collar work will largely be gone.
Unfortunately not many companies seem to require engineers to cycle between "feature" and "maintainability" work - hence those looking for the low-hanging fruits and know how to virtue signal seem to build their career on "features" while engineers passionate about correct solutions are left to pay for it while also labelled as "inefficient" by management. It's all a clown show, especially now with vibe-coding - no wonder we have big companies having had multiple incidents since vibing started taking off.
> Shipping “quality only” work for a long time can be stressful for your colleagues and the product teams.
I buried the lede a bit, but my frustration has been feeling like _nobody_ on my team prioritizes quality and instead optimizes for feature velocity, which then leaves some poor sod (me) to pick up the pieces to keep everything ticking over... but then I'm not shipping features.
At the end of the day if my value system is a mismatch from my employer's that's going to be a problem for me, it just baffles me that I keep ending up in what feels like an unsustainable situation that nobody else blinks at.
Employees, especially ones as well leveraged and overpaid as software engineers, are not victims. They can leave. They _should_ leave. Great engineers are still able to bet better paying jobs all the time.
> Great engineers are still able to bet better paying jobs all the time
I know a lot of people who tried playing this game frequently during COVID, then found themselves stuck in a bad place when the 0% money ran out and companies weren’t eager in hiring someone whose resume had a dozen jobs in the past 6 years.
Came here to say this. The right solution to this is still the same as it always was - teach the juniors what good code looks like, and how to write it. Over time, they will learn to clean up the LLM’s messes on their own, improving both jobs.
You can should speak up when tasks are poorly defined, underestimated, or miscommunicated.
Try to flat out “refuse” assigned work and you’ll be swept away in the next round of layoffs, replaced by someone who knows how to communicate and behave diplomatically.
If they're handing you broken code call them out on it. Say this doesn't do what it says it does, did you want me to create a story for redoing all this work?
Thst is definitely one tell, the hand rolled input parsing or error handling that people would never have done at their own discretion. The bigger issue is that we already do the error checking and parsing at the different points of abstraction where it makes the most sense. So it's bespoke, and redundant.
That is on the people using the AI and not cleaning up/thinking about it at all.
> At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up.
This has to be the most thankless job for the near future. It's hard and you get about as much credit as the worker who cleans up the job site after the contractors are done, even though you're actually fixing structural defects.
And god forbid you introduce a regression bug cleaning up some horrible redundant spaghetti code.
Near future being the key term here imo. The entire task I mentioned was not an engineering problem, but a communication issue. The two project owners could have just talked to each other about the design, then coded it correctly in the first pass, obviating the need for the code janitor. Once orgs adapt to this new workflow, they’ll replace the code janitors with much cheaper Claude credits.
We’ve had this too and made a change to our code review guidelines to mention rejection if code is clearly just ai slop. We’ve let like four contractors go so far over it. Like ya they get work done fast but then when it comes to making it production ready they’re completely incapable. Last time we just merged it anyways to hit a budget it set everyone back and we’re still cleaning up the mess.
> It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly.
The hell you are playing hero for? Delegate the choice to manager: ruin the codebase or allocate two weeks for clean-up - their choice. If the magical AI team claim they can do integration faster - let them.
IME one thing that makes this choice a very difficult one is oncall responsibilities. The thing that incentivizes code owners to keep their house in order is that their oncall experience will be a lot better. And you're the only one who is incentivized to think this way. Management certainly doesn't care. So by delegating the choice to management you're signing up for a whole bunch of extra work in the form of sleepless oncall shifts.
If someone is making the kind of mistakes that cause oncall issues to increase, put that person on call. It doesn't matter if they can't do anything, call them every time they cause someone else to be paged.
IME too many don't care about on call unless they are personally affected.
> If someone is making the kind of mistakes that cause oncall issues to increase
the problem is that identifying the root cause can take a lot of time, and often the "mistakes" aren't clearly sourced down to an individual.
So someone oncall just takes the hit (ala, waking up at 3am and having to do work). That someone may or may not be the original progenitor of said mistake(s).
Framed less blamefully, that's basically the central thesis of "devops". That is the notion that owning your code in production is a good idea because then you're directly incentivized to make it good. It shouldn't be a punishment, just standard practice that if you write code you're responsible for it in production.
I've heard of human engineers who are like that. "10x", but it doesn't actually work with the environment it needs to work in. But they sure got it to "feature complete" fast. The problem is, that's a long way from "actually done".
I know my mind fairly well, and I know my style of laziness will result in atrophying skills. Better not to risk it.
One of my co-workers already admitted as much to me around six months ago, and that he was trying not to use AI for any code generation anymore, but it was really difficult to stop because it was so easy to reach for. Sounded kind of like a drug addiction to me. And I had the impression he only felt comfortable admitting it to me because I don't make it a secret that I don't use it.
Another co-worker did stop using it to generate code because (if I'm remembering right) he can tell what it generates is messy for long-term maintenance, even if it does work and even though he's new to React. He still uses it often for asking questions.
A third (this one a junior) seemed to get dumber over the past year, opening merge request that didn't solve the problem. In a couple of these cases my manager mentioned either seeing him use AI while they were pairing (and it looked good enough so the problems just slipped by) or saw hints in the merge request with how AI names or structures the code.
I've been using ChatGPT to teach myself all sorts of interesting fields of mathematics that I've wanted to learn but never had the time previously. I use the Pro version to pull up as many actual literature references as I can.
I don't use it at all to program despite that being my day job for exactly the reason you mentioned. I know I'll totally forget how to program. During a tight crunch period, I might use it as a quick API reference, but certainly not to generate any code. (Absolutely not saying it's not useful for this purpose—I just know myself well enough to know how this is going to go haha)
How do you get chatgpt to teach you well? I feel like no matter how dense and detailed i ask it to be or how much i ask it to elaborate and contextualize topics with their adjacent topics to give me a full holistic understanding, it just sucks at it and is always short of helping me truly understand and intuit the subject matter.
The atrophy of manually writing code is certainly real. I'd compare it to using a paper map and a compass to navigate, versus say Google Maps. I don't particularly care to lose the skill, even though being good and enjoying the programming part of making software was my main source of income for more than a decade. I just can't escape being significantly faster with a Claude Code.
> he can tell what it generates is messy for long-term maintenance, even if it does work and even though he's new to React.
When one can generate code in such a short amount of time, logically it is not hard to maintain. You could just re-generate it if you didn't like it. I don't believe this style of argument where it's easy to generate with AI but then you cannot maintain it after. It does not hold up logically, and I have yet to see such a codebase where AI was able to generate it, but now cannot maintain it. What I have seen this year is feature-complete language and framework rewrites done by AI with these new tools. For me the unmaintainable code claim is difficult to believe.
We use it daily in our org. What you’re talking about is not happening. That being said, we have fairly decent mono repo structure, bunch of guides/skills to ensure it doesn’t do it that often. Also the whole plan + implement phases.
If it was July 2025, I would have agreed with you. But not anymore.
Yes, all the time. Yes, those go to production. AI has improved significantly the past 2 years, I highly recommend you give it another try.
I don't see the behaviour you describe, maybe if your impression is that of online articles or you use a local llama model or ChatGPT from 2 years ago. Claude regularly finds and resolves duplicated code in fact. Let me give you a counter-example: For adding dependencies we run an internal whitelist for AI Agents; new dependencies go through this system, we had similar concerns. I have never seen any agent used in our organisation or at a client, in the half year or so that we run the service, hallucinate a dependency.
So where does your responsibility of this code end ? Do you just push to repo, merge and that's it or do you also deploy, monitor and maintain the production systems? Who handles outages on saturday night, is it you or someone else ?
FWIW I mainly use Opus 4.6 on the $100/mo Max plan, and rarely run into these issues. They certainly occur with lower-tier models, with increased frequency the cheaper the model is - as for someone using it for a significant portion of their professional and personal work, I don’t really understand why this continues to be a widespread issue. Thoroughly vetting Plan Mode output also seems like an easy resolution to this issue, which most devs should be doing anyways IMO (e.g. `npm install random-auth-package`).
I used to experience those issues a lot. I haven't in a while. Between having good documentation in my projects, well-defined skills for normal things, simple to use testing tools, and giving it clear requirements things go pretty smoothly.
I'd say it still really depends on what you're doing. Are you working in a poorly documented language that few people use solving problems few people have solved? Are you adding yet another normal-ish kind of feature in a super common language and libraries? One will have a lot more pain than the other, especially if you're not supplying your own docs and testing tools.
There's also just a difference of what to include in the context. I had three different projects which were tightly coupled. AI agents had a hard time keeping things straight as APIs changed between them, constantly misnaming them and getting parameters wrong and what not. Combining them and having one agent work all three repos with a shared set of documentation made it no longer make mistakes when it needed to make changes across multiple projects.
LLMs rarely if ever proactively identify cleanup refactors that reduce the complexity of a codebase. They do, however, still happily duplicate logic or large blocks of markup, defer imports rather than fixing dependency cycles, introduce new abstractions for minimal logic, and freely accumulate a plethora of little papercuts and speed bumps.
These same LLMs will then get lost in the intricacies of the maze they created on subsequent tasks, until they are unable to make forward progress without introducing regressions.
You can at this point ask the LLM to rewrite the rat’s nest, and it will likely produce new code that is slightly less horrible but introduces its own crop of new bugs.
All of this is avoidable, if you take the wheel and steer the thing a little. But all the evidence I’ve seen is that it’s not ready for full automation, unless your user base has a high tolerance for bugs.
I understand Anthropic builds Claude Code without looking at the code. And I encounter new bugs, some of them quite obvious and bad, every single day. A Claude process starts at 200MB of RAM and grows from there, for a CLI tool that is just a bundle of file tools glued to a wrapper around an API!
I think they have a rats nest over there, but they’re the only game in town so I have to live with this nonsense.
I’m the same way. But I took a bite and now I’m hooked.
I started using it for things I hate, ended up using it everywhere. I move 5x faster. I follow along most of the time. Twice a week I realize I’ve lost the thread. Once a month it sets me back a week or more.
I repeatedly tried to use LLMs for code but god they suck. I've tried most tools and models and for me it's still way faster to write things by hand.
I'm a magical tool, it's almost like if I knew what I wanted to do ! Don't have to spend time explaining and correcting.
Also, a good part of the value of me writing code is that I know the code well and can fix things quickly. In addition, I've come to realize that while I'm coding, I'm mostly thinking about the project's code architecture and technical future. It's not something I'll ever want to delegate I think.
I use AI to discuss and possibly generate ideas and tests, but I make sure I understand everything and type it in except for trivial stuff. The main value of an engineer is understanding things. AI can help me understand things better and faster. If I just setup plans for AI and vibe, human capital is neglected and declines. I don't think there's much of a future if you don't know what you're doing, but there is always a future for people with deep understanding of problems and systems.
I think you are right, deep understanding of systems and domains will not become obsolete. I forsee some types of developers moving into a more holistic systems design and implementation role if coding itself becomes quite routinely automated.
I'm an engineer at Amazon - we use Kiro (our own harness) with Opus 4.6 underneath.
Most of my gripes are with the harness, CC is way better.
In terms of productivity I'm def 2-4X more productive at work, >10x more productive on my side business. I used to work overtime to deliver my features. Now I work 9-5 and am job hunting on the side while delivering relatively more features.
I think a lot of people are missing that AI is not just good for writing code. It's good for data analysis and all sorts of other tasks like debugging and deploying. I regularly use it to manage deployment loops (ex. make a code change and then deploy the changes to gamma and verify they work by making a sample request and verifying output from cloudwatch logs etc). I have built features in 2 weeks that would take me a month just because I'd have to learn some nitty technical details that I'd never use again in my life.
For data analysis I have an internal glue catalog, I can just tell it to query data and write a script that analyzes X for me.
AI and agents particularly have been a huge boon for me. I'm really scared about automation but also it doesn't make sense to me that SWE would be automated first before other careers since SWE itself is necessary to automate others. I think there are some fundamental limitations on LLMs (without understanding the details too much), but whatever level of intelligence we've currently unlocked is fundamentally going to change the world and is already changing how SWE looks.
I saw somewhere that you guys had All Hands where juniors were prohibited from pushing AI-assisted code due to some reliability thing going on? Was that just a hoax?
> Much of the coverage of the service incidents has focused on a weekly Amazon Stores operations meeting and a planned discussion of recent outages. Reviewing operational incidents is a routine part of these meetings, during which teams discuss root causes with the goal of continuing to improve reliability for customers.
This is something that's a part of every FAANG afaik. I know for a fact that there's no prohibition on pushing AI-assisted code. How would that even technically work? It'd basically mean banning Kiro/CC from the company.
> Only one of the incidents involved AI-assisted tooling, which related to an engineer following inaccurate advice that an AI tool inferred from an outdated internal wiki, and none involved AI-written code.
and this doesn't seem as "AI caused outage" as it was portrayed.
Not a hoax, saw it in the news. I'm not at Amazon but can confirm massive productivity gains. The issue is reviewing code. With output similar to a firehose of PR's we need to be more careful and mindful with PR's. Don't vibe code a massive PR and slap it on your coworkers and expect a review. The same PR etiquette exist today as it did years ago.
> make a code change and then deploy the changes to gamma and verify they work by making a sample request and verifying output from cloudwatch logs etc
This has been a godsend over the past week while deploying a couple services. One is a bridge between Linear and our Coder.com installation so folks can assign the work to an agent. Claude Code can do most of the work while I sleep since it has access to kubectl, Linear MCP, and Coder MCP. I no longer have to manually build, deploy, test, repeat. It just does it all for me!
How do you deal with a risk of LLM generating malicious code and then running it? I suspect it's a bit more difficult to set it up tailorer to your needs in a big corp.
> I have built features in 2 weeks that would take me a month just because I'd have to learn some nitty technical details that I'd never use again in my life.
In the bucket of "really great things I love about AI", that would definitely be at the top. So often in my software engineering career I'd have to spend tons of time learning and understanding some new technology, some new language, some esoteric library, some cobbled-together build harness, etc., and I always found it pretty discouraging when I knew that I'd never have reason to use that tech outside the particular codebase I was working on at that time. And far from being rare, I found that working in a fairly large company that that was a pretty frequent occurrence. E.g. I'd look at a design doc or feature request and think to myself "oh, that's pretty easy and straightforward", only to go into the codebase and see the original developer/team decided on some extremely niche transaction handling library or whatever (or worse, homegrown with no tests...), and trying to figure out that esoteric tech turned into 85% of the actual work. AI doesn't reduce that to 0, but I've found it has been a huge boon to understanding new tech and especially for getting my dev environment and build set up well, much faster than I could do manually.
Of course, AI makes it a lot easier to generate exponentially more poorly architected slop, so not sure if in a year or two from now I'll just be ever more dependent on AI explaining to me the mountains of AI slop created in the first place.
It’s too bad, really. While it’s easy to get discouraged about such things, over the course of my career all that learning of “pointless” tech has made me a much better programmer, designer, architect, and troubleshooter. The only way you build intuition about systems is learning them deeply.
Quite. On the face of it: possible career faux pas.
I own (with two other folk) my own little company and hire other people. I actively encourage my troops to have a bash but I suspect that a firm like AMZ would have differing views about what used to be called moonlighting. Mind you we only turnover a bit over £1M and that is loose change down the back of a sofa for AMZ ...
Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find: such is this team's code that does X?
I am yet to successfully prompt it and get a working commit.
Further, I will add that I also don't know any ICs personally who have successfully used it. Though, there's endless posts of people talking about how they're now 10x more productive, and everyone needs to do x y an z now. I just don't know any of these people.
Non-professionally, it's amazing how well it does on a small greenfield task, and I have seen that 10x improvement in velocity. But, at work, close to 0 so far.
Of the posts I've seen at work, they typically tend to be teams doing something new / greenfield-ish or a refactor. So I'm not surprised by their results.
I’ve probably prompted 10,000 lines of working code in the last two months. I started with terraform which I know backwards and forwards. Works perfectly 95% of the time and I know where it will go wrong so I watch for that. (Working both green field, in other existing repos and with other collaborators)
Moved on to a big data processing project, works great, needed a senior engineer to diagnose one small index problem which he identified in 30s. (But I’d bonked on for a week because in some cases I just don’t know what I don’t know)
Meanwhile a colleague wanted a sample of the data. Vibe coded that. (Extract from zip without decompressing) He wanted randomized. One shot. Done. Then he wanted randomized across 5 categories. Then he wanted 10x the sample size. Data request completed before the conversion was over. I would have worked on that for three hours before and bonked if I hit the limit of my technical knowledge.
Built a monitoring stack. Configured servers, used it to troubleshoot dozens of problems.
For stuff I can’t do, now I can do.
For stuff I could do with difficulty now I can do with ease.
For stuff I could do easily now I can do fast and easy.
Your vastly different experience is baffling and alien to me. (So thank you for opening my eyes)
I don’t find it baffling at all and both your experiences perfectly match mine.
Asking AI to solve a problem for you is hugely non-linear. Sometimes I win the AI lottery and its output is a reasonable representation of what I want. But mostly I loose the AI lottery and I get something that is hopeless. Now I have a conundrum.
Do I continue to futz with the prompt and hope if I wiggle the input then maybe I get a better output, or have I hit a limit and AI will never solve this problem? And because of the non linear nature I just never know. So these days I basically throw one dart. If it hits, great. If I miss I give up and do it the old fashioned way.
My work is in c++ primarily on what is basically fancy algorithms on graphs. If it matters.
Also at FAANG. I think I am using the tools more than my peers based on my conversations. The first few times I tried our AI tooling, it was extremely hit and miss. But right around December the tooling improved a lot, and is a lot more effective. I am able to make prototypes very quickly. They are seldom check-in ready, but I can validate assumptions and ideas. I also had a very positive experience where the LLM pointed out a key flaw in an API I had been designing, and I was able to adjust it before going further into the process.
Once the plan is set, using the agentic coder to create smaller CLs has been the best avenue for me. You don't want to generate code faster than you and your reviewers can comprehend it. It'll feel slow, but check ins actually move faster.
I will say it's not all magic and success. I have had the AI lead me down some dark corners, assuring me one design would work when actually it is a bit outdated or not quite the right fit for the system we are building for because of reasons. So, I wouldn't really say that it's a 10x multiplier or anything, but I'm definitely getting things done faster than I could on my own. Expertise on the part of the user is still crucial.
One classic issue I used to run into, is doing a small refactor and then having to manually fix a bunch of tests. It is so much simpler to ask the LLM to move X from A to B and fix any test failures. Then I circle back in a few minutes to review what was done and fix any issues.
The other thing is, it has visibility for the wider code base, including some of our infrastructure that we're dependent on. There have been a couple times in the past quarter where our build is busted by an external team, and I am able to ask the LLM given the timeframe and a description of the issue, the exact external failure that caused it. I don't really know how long it would have taken to resolve the issue otherwise, since the issues were missed by their testing. That said, I gotta wonder if those breakages were introduced by LLM use.
My job hasn't been this fun in a long, long time and I am a little uneasy about what these tools are going to mean for my personal job security, but I don't know how we can put the genie back into the bottle at this point.
I can second this. I’ve never had a problem writing short scripts and glue code in stuff ive mastered. In places I actually need help, I’m finding it slows me down.
Wow, that's such a drastic different experience than mine. May I ask what toolset are you using? Are you limited to using your home grown "AcmeCode" or have full access to Claude Code / Cursor with the latest and greatest models, 1M context size, full repo access?
I see it generating between 50% to 90% accuracy in both small and large tasks, as in the PRs it generates range between being 50% usable code that a human can tweak, to 90% solution (with the occasional 100% wow, it actually did it, no comments, let's merge)
I also found it to be a skillset, some engineers seem to find it easier to articulate what they want and some have it easier to think while writing code.
I used to think that the people who keep saying (in March 2026) that AI does not generate good code are just not smart and ask stupid prompts.
I think I've amended that thought. They are not necessarily lacking in intelligence. I hypothesize that LLMs pick up on optimism and pessimism among other sentiments in the incoming prompt: someone prompting with no hope that the result will be useful end up with useless garbage output and vice versa.
That sounds a lot more like confirmation bias than any real effect on the AI's output.
Gung-ho AI advocates overlook problems and seem to focus more on the potential they see for the future, giving everything a nice rose tint.
Pessimists will focus on the problems they encounter and likely not put in as much effort to get the results they want, so they likely see worse results than they might have otherwise achieved and worse than what the optimist saw.
That's a valid sounding argument. However many people with no strong view either way are producing functional, good code with AI daily, and the original context of this thread is about someone who has never been able to produce anything committable. Many, many real world experiences show something excellent and ready to go from a simple one shot.
This is kinda like that thing about how psychic mediums supposedly can't medium if there's a skeptic in the room. Goes to show that AI really is a modern-day ouija board.
Don’t know why you’re getting downvoted, this is a fascinating hypothesis and honestly super believable. It makes way more sense than the intuitive belief that there’s actually something under the human skin suit understanding any of this code.
It's probably more to do with the intelligence required to know when a specific type of code will yield poor future coding integrations and large scale implementation.
It's pretty clear that people think greenfield projects can constantly be slopified and that AI will always be able to dig them another logical connection, so it doesn't matter which abstraction the AI chose this time; it can always be better.
This is akin to people who think we can just keep using oil to fuel technological growth because it'll some how improve the ability of technology to solve climate problems.
It's akin to the techno capitalist cult of "effective altruism" that assumes there's no way you could f'up the world that you can't fix with "good deeds"
There's a lot of hidden context in evaluating the output of LLMs, and if you're just looking at todays success, you'll come away with a much different view that if you're looking at next year's.
Optimism is only then, in this case, that you believe the AI will keep getting more powerful that it'll always clean up todays mess.
I call this techno magic, indistinguishable from religious 'optimism'
The FANG code basis are very large and date back years might not necessarily be using open source frameworks rather in house libraries and frameworks none of which are certainly available to Anthropic or OpenAI hence these models have zero visibility into them.
Therefore combined with the fact that these are not reasoning or thinking machines rather probabilistic (image/text) generators, they can't generate what they haven't seen.
That's why coding agents usually scans various files to figure out how to work in a particular codebase. I work with very large and old project, and Codex most of time manages to work with our frameworks.
No it doesn't check out. I think it's becoming abundantly clear LLMs learn in real time as they speak to you. There's a lot of denial and people claiming they don't learn that their knowledge is fixed on the training data and this is not even remotely true at all.
LLMs learn dynamically through their context window and this learning is at a rate much faster than humans and often with capabilities greater than humans and often much worse.
For a code base as complex and as closed source as google the problems an LLM faces is largely the same as a human. How much can he fit into the context window?
It checks out if you take into account most developers are actually rather mediocre outside of places where they spend an insane amount of time and money to get good devs (including but not limited to FANG)
You're observing this "paradox", because what you call learning here is not learning in the ML sense; it's deriving better conclusions from more data. It's true for many ML methods, but it doesn't mean any actual learning happens.
Huh? I have over a hundred services/repos checked out locally, ranging from 10+ years old to new. I have no problem leveraging AI to work in this large distributed codebase.
Even internal stuff is usable by the model because it’s a pattern matching machine and there should be documentation available, or it can just study the code like a human.
Can you elaborate on the shortcomings you find in professional setting that aren't coming up on personal projects? With it handling greenfield tasks are you perhaps referring to the usual sort of boilerplate code/file structure setup that is step 0 with using a lot of libraries?
Not a FAANG engineer but also working at a pretty large company and I want to say you're spot on 1000%. It's insane how many "commenters" come out of the woodwork to tell you you're doing x or y wrong. They may not even frame it that way, but use a veneer of questions "what is your process like? Have you tried this product, etc." as a subtle way of completely dismissing your shared experience.
I work at aws and generally use Claude Opus 4.6 1M with Kiro (aws’s public competitor to Claude Code). My experience is positive. Kiro writes most of my code. My complaints:
1. Degraded quality over longer context window usage. I have to think about managing context and agents instead of focusing solely on the task.
2. It’s slow (when it’s “thinking”). Especially when it’s tasked with something simple (e.g., I could ask Claude Opus to commit code and submit for review but it’s just faster if I run the commands myself and I don’t want to have to think about conditionally switching to Haiku / faster models mid task execution).
3. It often requires a lot of upfront planning and feedback loop set up to the extent that sometimes I wonder if it would’ve been faster if I did it myself.
A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.
The context degradation problem gets much worse when you have multiple agents or models touching the same project. One agent compacts, loses what it knew, and now the human is the only source of truth for what actually happened vs what was reported done. If that human isn't a coder, they can't verify by reading the source either.
I've been working on this and landed on a pattern I call a "mechanical ledger", basically a structured state file that sits outside any context window and gets updated as a side effect of work, not as a step anyone remembers to do. Every commit writes to it, every failed patch writes to it, every test run writes to it. When a session starts (or an agent compacts), it reads the ledger and rebuilds context from ground truth instead of from memory.
Its not a novel idea really, its basically what ops teams do with runbooks and state files, but applied to the AI agent handoff problem. The interesting bit is making the updates mechanical so no agent can forget to do it.
> A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.
I was thinking about this recently. This kind of setup is a Holy Grail everyone is searching for. Make the damn tool produce the right output more of the time. And yet, despite testing the methods provided by the people who claim they get excellent results, I still come to the point where the it gets off rails. Nevertheless, since practically everybody works on resolving this particular issue, and huge amounts of money have been poured into getting it right, I hope in the next year or so we will finally have something we can reliably use.
Could you say more on how the tasks where it works vs. doesn't work differ? Just the fact that it's both small and greenfield in the one case and presumably neither in the other?
Can you provide an example of how you actually prompt AI models? I get the feeling the difference among everyone's experiences has to do with prompting and expectation.
Biggest difference I've noticed is giving the model constraints upfront rather than letting it freestyle. Something like "use only the standard library, no new files, keep it under 50 lines" produces dramatically better results than open-ended "build me X." It's less about clever prompting and more about narrowing the solution space so it can't wander off.
I find that the default Claude Code harness deals with the ambiguity best right now with the questionnaire system. So you can pose the core of the problem first and then specify only those implementation details that matter.
I wasn't implying that clever prompting needed to be used. I'm just trying to confirm that the person I was replying to isn't just saying what essentially amounts to "build me X".
When I write my prompts, I literally write an essay. I lay constraints, design choices, examples, etc. If I already have a ticket that lays out the introduction, design considerations, acceptance criteria and other important information, then I'll include that as well. I then take the prompt I've written and I request for the model to improve the prompt. I'll also try to include the most important bits at the end since right now models seem to focus more on things referenced at the end of a prompt rather than at the beginning.
Once I do get output, I then review each piece of generated code as if I'm doing an in-depth code review.
No one is saying “build x” and getting good results unless they didn’t have any expectations to begin with. What you describe is precisely right. Using the agents require a short leash and clear requirements and good context management (docs, skills, rules).
Some people (like me) still think that’s a fantastic tool, some people either don’t know how to do this or think the fact you have to do this means the tools are bunk. Shrug.
Around a year ago I started a new position at a very large tech company that I won't name, working on a pre-existing web project there. The code base isn't terrible - though not very good either, by-and-large - but it's absolutely massive, often over-engineered, pretty unorthodox, and definitely has some questionable design decisions; even after more than a year of working with it I still feel like a beginner much of the time.
This year I grudgingly bit the bullet and began using AI tools, and to my dismay they've been a pretty big boon for me, in this case. Not just for code generation - they're really good at probing the monolith and answering questions I have about how it works. Before I'd spend days pouring over code before starting work to figure out the right way to build something or where to break in, pinging people over in India or eastern Europe with questions and hoping they reply to me overnight. AI's totally replaced that, and it works shockingly well.
When I do fall back on it for code generation, it's mostly just to mitigate the tedium of writing boilerplate. The code it produces tends to be pretty poor - both in terms of style and robustness - and I'll usually need to take at least a couple of passes over it to get it up to snuff. I do find this faster than writing everything out by hand in the end, but not by a lot.
For my personal projects I don't find it adds much, but I do enjoy rubber ducking with ChatGPT.
Using these tools for understanding seems to be one of the best use cases - lots of pros, less dangerous cons (worst case scenario is a misleading understanding, but that can be minimized by directly double checking the claims being made).
In fact it looks like an arising theme is that whenever we use these tools it's valuable to maintain a human understanding of what's actually going on.
As a veteran freelance developer - aside from some occasional big wins, I'd say it's been net neutral or even net negative to my productivity. When I review AI-generated code carefully (and if I'm delivering it to clients I feel that's my responsibility) I always find unnecessary complexity, conceptual errors, performance issues, looming maintainability problems, etc. If I were to let it run free, these would just compound.
A couple "win" examples: add in-text links to every term in this paragraph that appears elsewhere on the page, plus corresponding anchors in the relevant page parts. Or, replace any static text on this page with any corresponding dynamic elements from this reference URL.
Lose examples: constant, but edit format glitches (not matching searched text; even the venerable Opus 4.6 constantly screws this up), unnecessary intermediate variables, ridiculously over-cautious exception-handling, failing to see opportunities to isolate repeated code into a function, or to utilize an existing function that exactly implements said N lines of code, etc.
It can only result in more work if you freelance because it you disclose that you used llm’s then you did it faster than usual and presumably less quality so you have to deliver more to retain the same income except now your paying all the providers for all the models because you start hitting usage limits and claude sucks on the weekends and your drive is full of ‘artifacts’, which incurs mental overhead that is exacerbated by your crippling adhd
And then all of a sudden you’re just arguing with the terminal all day - the specs are written by gpt, delivered in-the email written by gpt. Sometimes they dont even have the time to slice their prompt from the edges of the paste but the only thing i can think of is “i need to make the most of 0.5x off peak claude rates “
I found it’s great for:
1. Exploring new code bases or understanding PRs.
2. Prototyping new ideas.
3. Second opinion on problems and troubleshooting.
4. Rubber ducking.
5. Parallelize rote/boiler plate, while my deep focus is elsewhere.
6. First draft of documentation and reviews.
What I don’t understand is how some people can parallelize 5-10 engineering changes at once, and expect to support and maintain that code in the future.
The difficulty has always been in support and maintenance, not building something new, and that requires a deeper understanding.
The majority of code I've written since November 2025 has been created using agents, as opposed to me typing code into a text editor. More than half of that has been done from my iPhone via Claude Code for web (bad name, great software.)
I'm enjoying myself so much. Projects I've been thinking about for years are now a couple of hours of hacking around. I'm readjusting my mental model of what's possible as a single developer. And I'm finally learning Go!
The biggest challenge right now is keeping up with the review workload. For low stakes projects (small single-purpose HTML+JS tools for example) I'm comfortable not reviewing the code, but if it's software I plan to have other people use I'm not willing to take that risk. I have a stack of neat prototypes and maybe-production-quality features that I can't ship yet because I've not done that review work.
I mainly work as an individual or with one other person - I'm not working as part of a larger team.
Usually it's specification mistakes - I spot cases I hadn't thought to cover, or the software not behaving as usefully as if I had made a different design decision.
Occasionally I'll catch things it didn't implement at all, or find things like missing permission checks.
I have 10 years of experience. I am a reasonable engineer. I can tell you that about half of the hype on twitter is real. It is a real blessing for small teams.
We have 100k DAU for a consumer crud app. We built and maintain everything in-house with 3 engineers. This would have taken atleast 10 engineers 3-4 years back.
We don't have a bug list. We are not "vibe coding" , 2 of us understand almost all of the codebase. We have processes to make sure the core integrity of codebase doesn't go for a toss.
None has touched the editor in months.
Even the product folks can raise a PR for small config changes from slack.
Velocity is through the roof and code quality is as good if not better than when we write by hand.
We refactor almost A LOT more than before because we can afford to.
AI is already better at understanding code than 99.99% of human, the more I use it the more I believe this is true. It can draw connections between dots far quicker and accurate than a human could ever be.
At very least, AI is going to be a must even as a co-supervisor to your project
What in doubt right now, is whether AI can manage a codebase fully autonomously without bring it down, which I doubt it can at the moment. Be it 4.6 or 5.4, they always, almost always, add code instead of removing them, the sheer complexity will explode at certain point.
But that is my assessment for models TODAY, who knows where they will end up being in 6 months. AI is entering the recursive self improvement phase, that roadmap is laying in front our eyes, what it can and would unlock is truly, truly unpredictable.
The RAG models are very competent at programming. I am worried about my job as a SWE in the near future, but didn't the MIT paper about a week ago pretty much confirm that width-scaling the model is about to (or has already) stopped giving any measurable increase in quality because the training data no longer overfills the model?
Any authentic training data from pre-LLM's is assumed to have been used in training already and synthetic or generated data gives worse performing models, so the path of increasing its training data seems to be a dead end as well?
What is the next vector of training? Maybe data curation? Remove the low quality entries and accept a smaller, but more accurate data set?
I think the AI companies are starting to sweat a little, considering the promises they have made, their inability to deliver and turn a profit at its current state and the slowing improvements.
Interesting times! We are either all out of jobs or a massive market crash is imminent, awesome...
possibility yes - reality often no due to cost that would have to be incurred to make this happen. the "how they got those users" is the easy part if you offering is same(ish) at a fraction of the cost.
most developers are still in denial. Many are afraid of job loss or the corporations are forcing AI without clear scopes and proper implementation, which results in a mess. Small teams for small-medium products are productive as hell with AI.
Net negative. I do find it genuinely useful for code review, and "better search engine" or snippets, and sometimes for rubber ducking, but for agent mode and actual longer coding tasks I always end up rewriting the code it makes. Whatever it produces always looks like one of those students who constantly slightly misunderstands and only cares about minor test objectives, never seeing the big picture. And I waste so much time on the hope that this time it will make me more productive if only I can nudge it in the right direction, maybe I'm not holding it right, using the right tools/processes/skills etc. It feels like javascript frameworks all over again.
Same. I vacillate between thinking our profession will soon be over to thinking we’re perfectly safe. Sometimes, it’s brilliant. It is very good at exploring and explaining a codebase, finding bugs, and doing surgical fixes. It’s sometimes good at programming larger tasks, but only if you really don’t care about code quality.
The one thing I’m not sure about is: does code quality and consistency actually matter? If your architecture is sufficiently modular, you can quickly and inexpensively regenerate any modules whose low quality proves to be problematic.
I've had quite a bit of luck with using AI-assisted tooling for some specific workflows, and very little luck with others. To the extent that there's a trend[^1], it seems to be that tasks where I would spend a lot of time to produce a very small amount of output which is easy to evaluate objectively[^2] are sped up considerably, tasks where I would produce a large amount of output quickly (e.g. boilerplate) are sped up slightly, and most other tasks are unaffected or even slowed down (if I try to use AI tooling for them and decide it's not good enough yet).
As always, my views are my own and do not necessarily reflect the views of my employer.
[^1]: There's less of a trend than I'd expect. There are some quite difficult-to-me tasks that AI nails (e.g. type system puzzles) and some trivial-to-me tasks that AI struggles with (e.g. "draw correct conclusions when an image is uploaded of an ever-so-slightly nonstandard data visualization like a stacked bar chart").
[^2]: My favorite example of this is creating a failing test with a local reproduction of a reported bug on production - sure I _could_ write this myself, but usually these tests are a little bit finicky to write, but once written are either obviously testing the right thing or obviously testing the wrong thing, and the code quality doesn't really matter, so there's not much benefit in having human-written code while there's a substantial benefit in having any tests like this vs not having them.
It’s completely inconsistent for me, and any time I start to think it is amazing, I quickly am proven wrong. It definitely has done some useful things for me, but as it stands any sort of “one shot” or vibecoding where I expect the ai to complete a whole task autonomously is still a long ways off.
Copilot completions are amazingly useful. chatting with the chatbot is a super useful debugging tool. Giving it a function or database query and asking the ai to optimize it works great. But true vibe coding is still, imho, more of a party trick than an actual productivity multiplier. It can do things that look useful, and it can do things that solve immediate self-contained problems. but it can’t create launchable products that serve the needs of multiple users.
I work at a very prominent AI company. We have access to every tool under the sun. There are various levels of success for all levels — managers, PMs, engineers.
We have cursor with essentially unlimited Opus 4.6 and it’s fundamentally changed my workflow as a senior engineer. I find I spend much more time designing and testing my software and development time is almost entirely prompting and reviewing AI changes.
I’m afraid my coding skills are atrophying, in fact I know the are, but I’m not sure if the coding was the part of my job I truly enjoyed. I enjoy thinking higher-level: architecture, connecting components, focusing on the user experience. But I think using these AI tools is a form of golden handcuffs. If I go work at a startup without the money I pay for these models, I think for the first time in my career I would be less likely to be able to successfully code a feature than I could last year.
So professionally there are pros and cons. My design and architecture skills have greatly improved as I am spending more time doing this.
Personally it’s so much fun. I’ve made several side projects I would have never done otherwise. Working with Claude code on greenfield projects is a blast.
I think people get a bit paranoid about coding skills atrophying. I had a period where I stopped programming for multiple years and it really only took a month to get back into the swing of things when I returned, and most of that was just re-jogging my memory on the syntax and standard library classes (C++ at the time).
A month is quite a long time compared to "I can just do this at-will from neutral at any time".
...particularly in situations where you might have to navigate a change in jobs and get back to the point where you can reasonably prove that you can program at a professional level (will be interesting to see how/if the interviewing process changes over time due to LLMs).
i also worry but am also shocked how far a single $20 sub gets me on side project. i pay for 3 (cc, codex, gemini) but am almost never going beyond cc, even when im merging several prs a day.
I have a lot of experience, low and high level. These AI tools allow me to "discuss" possibilities, research approaches, and test theories orders of magnitude faster than I could in the past.
I would roughly estimate that my ability to produce useful products is at least 20x. A good bit of that 'x' is because of the elimination of mental barriers. There have always been good ideas I had which I knew could work, but I also knew that to prove that they could work would take a lot of focus and research (leveling up on specific things). And that takes human energy - while I'm busy also trying to do good things in my day job.
Now I have immensely powerful minions and research assistants. I can test any theory I have in an hour or less.
While these minions are being subsidized in the wonderful VC way, I can get a lot of done. If the real costs start to bleed through, I'll have to scale back my explorations. (Because at a point, I'll have to justify testing my theories against spending 2-300$.)
To your questions, I'm usually a solo builder anyway. I've built serious things for serious companies, but almost always solo. So that's quite a burden. And now I'm weary of all that corporate stuff, so I build for myself. And what a joy it is, having these powertools.
If I were in a company right now, I could absolutely replace a team of 5 people with me + AI... assuming the CTO wasn't the (usual) limiting factor.
It's really interesting how delusional people here can get when their livelihood depends on it. It's a game changer guys. I've been working professionally for 12 years. Big companies, small companies, freelance, startup CTO nowadays. It's multiplier. It gives me superpowers. If you don't feel the superpowers, you're either missing out or in denial. Embrace agentic coding.
We got broad and wide access to AI tools maybe a month ago now. AI tools meaning claude code, codex, cursor and a set of other random AI tools.
I use them very often. They've taken a lot of the fun and relaxing parts of my job away and have overall increased my stress. I am on the product side of the business and it feels necessary for me to have 10 new ideas and now the ones with the most ideas will be rewarded, which I am not as good at. Ive tried having the agents identify opportunities for infra improvements and had no good luck there. I haven't tried it for product suggestions but I think it would be poor at that too.
I get sent huge PRs and huge docs now that I wasnt sent before with pressure to accept them as is.
I write code much faster but commit it at the same pace due to reviews taking so long. I still generate single task PRs to keep them reviewable and do my own thorough review before hand. I always have an idea in ny head about how it should work before getting started, and I push the agent to use my approach. The AI tools are good at catching small bugs, like mutating things across threads. I like to use it to generate plans for implementation (that only I and the bots read, I still handwrite docs that are broadly shared and referenced).
Overall, AI has me nervous. Primarily because it does the parts that I like very well and has me spending a higher portion of my job on the things I dont like or find more tiresome.
I built an ERP system called PAX ERP mostly solo for small manufacturers in the USA.
Stack is React, Express, PostgreSQL, all on AWS with a React build pipeline through GitHub Actions. It handles inventory, work orders (MRP), purchasing, GAAP accounting, email campaigns, CRM, shipping (FDX or UPS), etc.
AI has been useful (I use Claude Code, mainly Haiku model), but only if I'm steering it carefully and reviewing everything. It is obviously not great at system design, so I still need to know exactly what I'm trying to do. If I don't it'll often make things overly complicated or focus on edge cases that don't really exist.
It helps a lot with: Writing/refactoring SQL, Making new API routes based on my CRUD template, Creating new frontend components from specs and existing templates, Debugging and explaining unexpected behavior, Integrating third-party APIs (Fedex for shipping, Resend for emails). It understands their documentation easily and helps connect the pieces.
In practice, it feels like a fancy copy/paste (new routes, components) or a helpful assistant. With careful guidance and review, it's a real efficiency boost, especially solo.
I'm somewhat optimistic... I think a lot of companies and managers are in a wait and see mode. The AI tooling can be genuinely helpful, but IMO definitely needs manual review and testing for function and security.
Personally, mostly just using it for personal projects that I've been sitting on for years... it's been pretty motivating and I'm actually making progress, though I'm split across half a dozen things so it's slower going. I'm also far more interactive than vibe coding one and done efforts. I'm finicky when it comes to UX for user facing apps, and DevEx for developer APIs on libraries I work on.
It's nice, far from perfect... I still liken it to managing foreign developer teams... the more you specify ahead of time, the better the results. The difference is a turn around in minutes instead of the next business day. Iteration is very fast by comparison. That said, sometimes the agent will make toddler decisions like rewriting all the broken tests and updating the docs to match the behavior instead of correcting the behavior to match the api and tests.
I foresee that the AI blindness at CEO/CFO level and the general hype (from technical and non technical press and media) in our society that software engineering is over etc will result in severe talent shortage in 5-7 years resulting in bidding wars for talent driving salaries 3x upwards or more.
Before the advent of widespread LLM usage, and more particularly, before LLM-using coworkers began submitting large PRs against codebases I am the primary maintainer of, my velocity had never been greater.
I do not like the current culture around LLMs. I do not use LLMs, I shall continue to resist any peer pressure to use them, much as I have resisted IDEs in favour of CLI tooling, vim, tmux and the like. I feel my output is incredibly devalued compared to the before times.
On the one hand, my passion for personal projects has never been greater. It helps me feel as though I am bettering myself, pushing the boundaries of my capabilities without resorting to LLM usage. However, I no longer release my code openly.
On the other hand, on top of building resentment towards being treated even more interchangeably than before, I resent the asymmetry of my LLM-using peers submitting large pull requests I am obliged to review, on codebases I have never touched before, applications I have never had the occasion to use; my team is managed in such a way that everybody is more or less working on independent projects, due to pressures to deliver at a much faster pace.
I really wish the fervour would die down for the sake of my own sanity.
This feels like a pretty negative take on what seems like impactful technology that is not going away, and will (and already has had) big impacts on how people work and build. Do you completely reject the idea of using them, ever? Do they have absolutely 0 utility for you?
Currently in my third year working full time and sort of realizing two things.
1. AI ( specifically Claude code and codex ) is really good and can do quite a bit of the work that when I started I had to myself.
2. AI can’t do all of the work and the last 10% ( or 5 or whatever percent it can’t do ) is something I need to do and the only way I can do it properly is if I have a good mental model from the other 90% which doesn’t happen if I use Claude code
So far this year I’ve realized I’m better of not using it except for simple questions I would otherwise google. Not necessarily because it’s bad but because it makes me worse.
Edit: this also sort of applies to my side projects as well. I’m realizing more the value of side project wasn’t the end result ( since those are mostly my own personal apps ) but the learnings gleaned from them I don’t get if I use Claude code.
I use it purely as a search engine. It is pretty good at dehtml/dejs/decss or a distraction free access to articles / docs. I had dabbled with claude for purely writing testcases which I'm very lazy at.
But man, people really oversell LLM's coding capabilities. It is good at pattern matching and replicating elsewhere. That's about it.
I had asked a simple question to opus 4.5.
> Python redis asyncio sample code.
The 3 attempts at generation all failed at the import statement. I verified if such an import structure ever existed in older versions. Never.
But it looked convincing enough though!?
Is it because there is not much asyncio code (2014) is available for claude to train on?
i'm a senior engineer at a mid-size, publicly traded company.
my team has largely avoided AI; our sister team has been quite gungho on it. i recently handed off a project to them that i'd scoped at about one sprint of work. they returned with a project design that involved four microservices, five new database tables, and an entirely new orchestration and observability layer. it took almost a week of back-and-forth to pare things down.
since then, they've spent several sprints delivering PRs that i now have to review. there's lots of things that don't work, don't make sense, or reinvent things we already have from scratch. almost half the code is dedicated to creating 'reusable' and 'modular' classes (read: boilerplate) for a project that was distinctly scoped as a one-off. as a result, this takes hours, and it's cut into my own sprint velocity. i'm doing all the hard work but receiving none of the credit.
management just told me that every engineer is now required to use AI. i'm tired.
It has definitely made me more productive. That said, that productivity isn't coming from using it to write business logic (I prefer to have an in-depth understanding of the logical parts of the codebases that I'm working on. I've also seen cases in my work codebases where code was obviously AI generated before and ends up with gaping security or compliance issues that no one seemed to see at the time).
The productivity comes from three main areas for me:
- Having the AI coding assistance write unit tests for my changes. This used to be by far my least favorite part of my job of writing software, mostly because instead of solving problems, it was the monotonous process of gathering mock data to generate specific pathways, trying to make sure I'm covering all the cases, and then debugging the tests. AI coding assistance allows me to just have to review the tests to make sure that they cover all the cases I can think of and that there aren't any overtly wrong assumptions
- Research. It has been extraordinarily helpful in giving me insight into how to design some larger systems when I have extremely specific requirements but don't necessarily have the complete experience to architect them myself - I know enough to understand if the system is going to correctly accomplish the requirements, but not to have necessarily come up with architecture as a whole
- Quick test scripts. It has been extremely useful for generating quick SQL data for testing things, along with quick one-off scripts to test things like external provider APIs
> Research. It has been extraordinarily helpful in giving me insight into how to design some larger systems when I have extremely specific requirements but don't necessarily have the complete experience to architect them myself - I know enough to understand if the system is going to correctly accomplish the requirements, but not to have necessarily come up with architecture as a whole.
I agree, this is where coding agents really shine for me. Even if they get the details wrong, they often pinpoint where things happen and how quite well.
They're also great for rapid debugging, or assisted bug fixing. Often, I will manually debug a problem, then tell the AI, "This exception occurs in place Y because thing X is happening, here's a stack trace, propose a fix", and then it will do the work of figuring out where to put the fix for me. I already usually know WHAT to do, it's just a matter of WHERE in context. Saves a lot of time.
Likewise, if I have something where I want thing X to do Y, and X already does Z, then I'll say, "Implement a Y that works like Z but for A B C", and it'll usually get it really close on the first try.
I've only recently begun using copilot auto-complete in Visual Studio using Claude (doing C# development/maintenance of three SaaS products). I've been a coder since 1999.
The suggestions are correct about 40% of the time, so I'm actually surprised when they're right, rather than becoming reliant on them. It saves me maybe 10 minutes a day.
The only part AI auto complete I found I really like is when I have a function call that takes like a dozen arguments, and the auto complete can just shove it all together for me. Such a nice little improvement.
I have been begging Claude not to write comments at all since day 1 (it's in the docs, Claude.md, i say the words every session, etc) and it just insists anyway. Then it started deleting comments i wrote!
Yeah, it usually gets the required args right based on various pieces of context. It have a big variation though between extension. If the extension can't pull context from the entire project (or at least parts of it) it becomes almost useless.
IntelliJ platform (JetBrains IDEs) has this functionality out of the box without "AI" using regular code intelligence. If all your parameters are strings it may not work well I guess but if you're using types it works quite well IME.
Can't use JetBrains products at work. I also unfortunately do most of my coding at work in Python, which I think can confound things since not everything is typed
... you can't use JetBrains? What logic created a scenario where you can't use arguably the best range of cross platform IDEs, but you can somehow use spicy autocomplete to imitate some of their functionality, poorly?
I work in an extremely security minded industry. There are strict guidelines about what we can and can't use. JetBrains isn't excluded for technical reasons, but geopolitical ones.
The AI models we use are all internally hosted, and any software we use has to go through an extensive security review.
Context: micro (5 person) software company with a mature SaaS product codebase.
We use a mix of agentic and conversational tools, just pick your own and go with it.
For Unity development (our main codebase and source of value) I give current gen tools a C- for effectiveness. For solving confined, well modularisable problems (eg refactor this texture loader; implement support for this material extension) it’s good. For most real day to day problems it’s hopelessly confused by the large codebase full of state, external dependency on chunks of Unity, implicit hardware-dependent behaviours, etc. It has no idea how to work meaningfully with Unity’s scene graph or component model. I tried using MCP to empower it here: on a trivial test project it was fine. In a real project it got completely lost and broke everything after eating 30k tokens and 40 minutes of my time, mostly because it couldn’t understand the various (documented) patterns that straddled code files and scene structure.
For web and API development I give it an A, with just a little room for improvement. In this domain it’s really effective all the way down the logical stack from architectural and deployment decisions all the way down to implementation details and debugging including digging really deep in to package version incompatibilities and figuring out problems in seconds that would take me hours. My one criticism would be the - now familiar - “junior developer” effect where it’ll often run ahead with an over engineered lump of machinery without spotting a simpler more coherent pattern. As long as you keep an eye on it it’s fine.
So in summary: if what you’re doing is all in text, nothing in binary, doesn’t involve geometric or numerical reasoning, and has billions of lines of stack overflow solutions: you’ll be golden. Otherwise it’s still very hit and miss.
I have good success using Copilot to analyze problems for me, and I have used it in some narrow professional projects to do implementation. It's still a bit scary how off track the models can go without vigilance.
I have a lot of worry that I will end up having to eventually trudge through AI generated nightmares since the major projects at work are implemented in Java and Typescript.
I have very little confidence in the models' abilities to generate good code in these or most languages without a lot of oversight, and even less confidence in many people I see who are happy to hand over all control to them.
In my personal projects, however, I have been able to get what feels like a huge amount of work done very quickly. I just treat the model as an abstracted keyboard-- telling it what to write, or more importantly, what to rewrite and build out, for me, while I revise the design plans or test things myself. It feels like a proper force multiplier.
The main benefit is actually parallelizing the process of creating the code, NOT coming up with any ideas about how the code should be made or really any ideas at all. I instruct them like a real micro-manager giving very specific and narrow tasks all the time.
I’ve been an overt AI hater but have found very recently that, though I still hate a great many things about AI, it has become useful for coding.
In 10m Gemini correctly diagnosed and then fixed a bug in a fairly subtle body of code that I was expecting to have to spend a couple hours working on.
I spent much of the past week using Gemini to build a prototype of a clean new (green field) system involving RPCs, static analysis, and sandboxing. I give it very specific instructions, usually after rounds of critical design discussions, and it generates structurally correct code that passes essentially valid tests. Error handling is a notable weakness. I review the code by hand after each step and often make changes, and I expect to go over the over the whole thing very carefully at the end, but it has saved me many hours this week.
Perhaps more valuable than the code has been the critical design conversation, in which it mostly is fluent at the level of an experienced engineer and has been able explain, defend, and justify design choices quite coherently. This saved time I would otherwise have spent debating with coworkers. But it’s not always right and it is easily led astray (and will lead astray), so you need a clear idea in mind, a firm hand, and good judgment.
> This saved time I would otherwise have spent debating with coworkers. But it’s not always right and it is easily led astray (and will lead astray), so you need a clear idea in mind, a firm hand, and good judgment.
The “will lead astray” part is concerning. If you already have a clear idea in mind, you probably don’t need to have the debate with coworkers.
If you are having a debate with coworkers or AI, you would rather that they be knowledgeable enough to not lead you astray.
In cases where I don’t have a clear understanding of some area, yet I don’t have someone knowledgeable to talk to, I have found myself having to discuss the same point with multiple LLMs from multiple angles to tease out the probable right way.
In summary: obviate experts, receive correct guidance, save time —- pick any two.
Most of the things I've used LLMs for is scripting code for integrations between systems, or scripts that extract and transform data from APIs.
For this specific use case, LLMs and their integrations with tools like VSCode have been excellent. A simple instruction file dictating what libraries to use, and lines about where to look for up-to-date API docs, increases the chances of one-shots significantly.
My favorite part has been that I'm able to use libraries I wouldn't have used previously like openpyxl. A use case like "get data from an API, transform it, and output it to an excel file with these columns" is super fast, and outputs data to a stakeholder/non-techy format.
It made me chuckle when Claude etc. release Excel integrations, since working with Excel files seems to have been at a great stage for people who've already worked with Excel/CSV libraries.
The number 1 suggestion I'd have for people eager to work with text is to use models to learn about old unix tools like grep/sed etc. With these powerful tools + modern tools + code you can build quite complex integration code for many uses. Don't sleep on the classic unix cli commands and download stuff from github to achieve things that have already been solved 40 years ago :)
I just started a new job. A mix of JS, WASM (C++) and Python. It's been a blessing to help understand and explain an unfamiliar codebase to me. Sometimes the analysis isn't deep enough but I've been able to create enough guiding documents to get it right most of the way through, and for the rest I can continue and dive deeper on my own with further prompting.
I've started using it to write some code, which I then use further prompting to review before my own final review. I feel a lot more productive, I can focus on high level ideas and not think about tiny implementation details.
Having instrumentation code magically created in minutes and being able to validate assumptions before/after making changes by doing manual testing and feeding AI logs has been a great use for me - this kind of stuff is boring and would kill my motivation and productivity in the past. AI helps here so I can move on to the fun stuff, helps me stay engaged and interested.
It's great for writing unit tests and doing log analysis. The usual AI pitfalls apply like going into loops that lead nowhere and hallucinating things, but I've gotten better at spotting it and steering it away. I try not to take what it gives me at face value and use follow up prompts to challenge assumptions or verify things.
So overall, it's been an immense help for me. I've got some interesting projects coming up that are more greenfield work, we'll see if this holds up compared to an existing codebase.
I’m not a professional developer but I can find my way around several languages and deployment systems. I have used Claude to migrate a mediumsized Laravel 5 app to Laravel 11 in about 2-3 days. I would not have dared to touch it otherwise.
In my day job I’m currently a PM/operations director at a small company. We don’t have programmers. I have used AI to build about 12 internal tools in the past year. They’re not very big, but provide huge productivity gains. And although I do not fully understand the codebase, I know what is where. Three of these tools I’m now recreating based on our usage and learnings.
I have learned a ton about all kinds of development concepts in a ridiculously short timeframe.
Am using Claude to attempt to do refactoring and find bugs. Sometimes its fantastic, finding issues instantly that'd take a lot of trawling or insider knowledge otherwise. Other times it gets obsessed about irrelevant things, makes suggestions that for some other obscure but non obvious reason don't work in practice. The generated code sometimes has excellent ideas I wouldn't have thought of. Other times it has places for bugs to lurk e:g if a directory isn't there, make it. Er, no thanks I want you to blow up if the dir isn't there because if it isn't , something else major went wrong. The trick is knowing when its going to be good and when hopeless and take you down a rabbit hole. Perhaps that is a meta skill on the part of the human developer. But I'm not optimistic about things improving, its the nature of how it is. The AI doesn't know personally the previous devs on the team, their programming tastes, the discussions they had at planning etc. Its got no context.
I am working on a sub 100KLOC Rust application and can't productively use the agentic workflows to improve that application.
On the other hand, I have tried them a number of times in greenfield situations with Python and the web stack and experienced the simultaneous joy and existential dread of others. They can really stand new projects up quick.
As a founder, this leaves me with what I describe as the "generation ship" problem. Is it possible that the architecture we have chosen for my project is so far out of the training data that it would be faster to ditch the project and reimplement it from scratch in a Claude-yolo style? So far, I'm convinced not because the code I've seen in somewhat novel circumstances is fairly mid, but it's hard to shake the thought.
I do find chatting with the models incredibly helpful in all contexts. They are also excellent at configuring services.
If what you are doing is novel then I don't think yolo'ing it will help either. Agents don't do novel.
I've even noticed this in meeting summaries produced by AI:
A prioritisation meeting? AI's summary is concise, accurate, useful.
A software algorithm design meeting, trying to solve a domain-specific issue? AI did not understand a word of what we discussed and the summary is completely garbled rubbish.
If all you're doing is something that already exists but you decided to architecture it in a novel way (for no tangible benefit), then I'd say starting from scratch and make it look more like existing stuff is going to help AI be more productive for you.
Otherwise you're on your own unless you can give AI a really good description of what you are doing, how things are tied together etc.
And even then it will probably end up going down the wrong path more often than not.
I’m a UX designer not a coder, but this is so bizarre to me because shouldn’t every project be doing something novel? Otherwise why does it exist? If this industry is so full of people independently writing the same stuff that AI can replicate it…then it was a vast misallocation of resources to begin with.
I'm surprised there's no more attempts to stablize around a base model, like in stable diffusion, then augment those models with LoRas for various frameworks, and other routine patterns. There's so much going into trying to build these omnimodels, when the technology is there to mold the models into more useful paradigms around frameworks and coding patterns.
Especially, now that we do have models that can search through code bases.
I work in an R&D team as research scientist/engineer.
Cursor and Claude Code have undoubtedly accelerated certain aspects of my technical execution. In particular, root causing difficult bugs in a complicated codebase has been accelerated through the ability to generate throwaway targeted logging code and just generally having an assistant that can help me navigate and understand complex code.
However, overall I would say that AI coding tools have made my job harder in two other ways:
1. There’s an increased volume of code that requires more thorough review and/or testing or is just generally not in keeping with the overall repo design.
2. The cost is lowered for prototyping ideas so the competitive aspect of deciding what to build or which experiment to run has ramped up. I basically need to think faster and with more clarity to perform the same as I did before because the friction of implementation time has been drastically reduced.
I got insanely more productive with Claude Code since Opus 4.5. Perhaps it helps that I work in AI research and keep all my projects in small prototype repos. I imagine that all models are more polished for AI research workflow because that's what frontier labs do, but yeah, I don't write code anymore. I don't even read most of it, I just ask Claude questions about the implementation, sometimes ask to show me verbatim the important bits. Obviously it does mistakes sometimes, but so do I and everyone I have ever worked with. What scares me that it does overall fewer mistakes than I do. Plan mode helps tremendously, I skip it only for small things. Insisting on strict verification suite is also important (kind of like autoresearch project).
I am working on getting the sailing Captain license (I started sailing when I was 8), and move my life there. I hate how things work nowadays. I feel like I am a police officer vs my friends/coworkers AI code. And I don't want to do that
I'm always skeptical to new tech, I don't like how AI companies have reserved all memory circuits for X years, that is definitely going to cause problems in society when regular health care sector businesses can't scale or repair their infra, and the environmental impact is also a discussion that I am not qualified to get into.
All I can say for sure is that it is absolutely useful, it has improved my quality of life without a doubt. I stick to the principle that it's here to improve my work life balance, not increase output for our owners.
And that it has done, so far. I can accomplish things that would have taken me weeks of stressful and hyperfocused work in just hours.
I use it very carefully, and sparingly, as a helpful tool in my toolbox. I do not let it run every command and look into every system, just focused efforts to generate large amounts of boilerplate code that would require me to have a lot of docs open if I were to do it myself.
I definitely don't let it read or write my e-mails, or write any text. Because I always loved writing, and will never stop loving it.
It's here to stay, because I'm not alone in feeling this way about it. So the staunch AI-deniers are just wasting their time. Just like any other tech, it's going to be used against humans, against the already oppressed.
I definitely recognize that the tech has made some people lose their minds. Managers and product owners are now vibe coding thinking they can replace all their developers. But their code base will rot faster than they think.
A guy on the team passes the issues he gets directly to Copilot, and holy crap, it shows. He hasn't admitted to doing it, but the full code rewrites whenever he's asked to change something are telling.
I'm getting tired, honestly. I'd prefer the simpler "I don't know" of old to six pages of bullshit I have to review.
I run a small game studio. I use Cursor to write features that I don’t want to hand code, but wouldn’t ask a teammate to do. Usually that is because describing the idea to a person would take about as much effort and the result would take longer.
These are usually internal tools, workflow improvements, and one off features. Anything really central to the game’s code gets human coded.
I think the further you are from the idea part, the less fun AI coding will be for you. Because now you need to not just translate some spec to code, you have to translate it to a prompt, which ups the chances of playing the telephone game. At least when you write the code yourself you are getting real with it and facing all the ambiguities as a matter of course. If you just pass it to an LLM you never personally encounter the conflicts, and it might make assumptions you would not… but you don’t even realize it because they are assumptions!
Using claude-code for fixing bugs in a rather huge codebase. Reviews the fixes and if i think it wrote something I would make a pr off i use it. Understanding is key I think and giving it the right context. I have about 20 years of experience of programming and I’m letting it code in a domain and language I know very well. It saves me a lot when the bug requires finding a needle in a haystack.
It's good. I use codex right now. I purposefully slow down to at least read/ review the code it generates , unless I'm creating something intentionally throw away. It helps me most dealing with languages and frameworks I'm not familiar with. I also use chatgpt as a rubber duck and although it's often too verbose I enjoy it. There are still many times where it will not provide the key insight to a problem but once you supply it it instantly agrees like it was always obvious. On the other hand it has helped me grok many subjects especially academic
For my specific niche (medical imaging) all current models still suck. The amount of expert knowledge required to understand the data and display it in the right way - probably never was in the training set.
We have this one performance-critical 3D reconstruction engine part, that just just has to go FAST through billions of voxels. From time to time we try to improve it, by just a bit. I have probably wasted at least 2 full days with various models trying out their suggestions for optimizations and benchmarking on real-world data. NONE produced an improvement. And the suggested changes look promising programming-wise, but all failed with real-world data.
These models just always want to help. Even if there is just no way to go, they will try to suggest something, just for the sake of it. I would just like the model to say "I do not know", or "This is also the best thing that I can come up with"... Niche/expert positions are still safe IMHO.
On the other hand - for writing REST with some simple business logic - it's a real time saver.
Most replies here are about writing code faster. But there's a gap nobody's talking about: AI agents are completely blind to running systems.
When you hit a runtime bug, the agent's only tool is "let me add a print statement and restart". That works for simple cases but it's the exact same log-and-restart loop we fall back to in cloud and containerized environments, just with faster typing.
Where it breaks down: timing-sensitive code, Docker services, anything where restarting changes the conditions you need to reproduce.
I've had debugging sessions where the agent burned through 10+ restart cycles on a bug that would've been obvious if it could just watch the live values.
We've given agents the ability to read and write code. We haven't given them the ability to observe running code. That's a pretty big gap.
Timestamps aren't the issue. The problem is the cycle itself: stop the process, add the log line, restart, wait for the right conditions to hit that code path again. For anything timing-sensitive or dependent on external state, each restart changes what you're trying to observe.
I've used agents to look at traces, stack dumps, and have used them to control things like debuggers. I've had them exec into running containers and poke around. I've had them examine metrics, look into existing logs, look at pcaps, and more. Any kind of command I could type into a console they can do, and they can reason about the outputs of such a command.
In fact last night I had it hacking away at a Wordpress template. It was making changes and then checking screenshots from a browser window automatically confirming it's changes worked as planned.
That's close to what I'm thinking about. Curious what debugger setup you're using with agents - are you giving them access via MCP or just having them run CLI commands?
I'm mostly really enjoying it! While it's not my main job, I've always been a tool builder for teams I work on, so if I see a place where a little UI or utility would make people's life easier, I'd usually hack something together in a few hours and evolve it over time if people find it useful. That process is easily 10x faster than before.
My main work is training Text-to-Speech models, and the friction of experimenting with model features or ideas has dropped massively. If I want to add a new CFG implementation, or conditioning vector, 90% of the time Opus can one-shot it. It generally does a good job of making the model, inference and training changes simultaneously so everything plays nicely. Haven't had any major regressions or missed bugs yet, but we'll see!
The downside is reviewing shitty PRs where it's clear the engineer doesn't fully understand what they're doing, and just a general attitude of "I dunno, Claude suggested it" that's getting pretty exhausting.
For my job which is mostly YAML engineering with some light Go coding (Platform) I'm finding it useful. We're DRY-ing out a bunch of YAML with CUE at the moment and it's sped up that work up tremendously.
When it comes to personal projects I'm feeling extremely unmotivated. Things feel more in reach and I've probably built ten times the number of throwaway projects in the past year than I have in previous years. Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them. I have a feeling of 'what's the point' publishing these projects when the same code is only a few prompts away for someone else too. And publishing them under my name only cheapens the rest of my work which I put real cognitive effort into.
I think I want to focus more on developing knowledge and skills moving forward. Whatever I can produce with an LLM in a few hours is not actually valuable unless I'm providing some special insight, and I think I'm coming to terms with that at the moment.
> Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them
For me, this is a key differentiator between “AI-assisted” and “vibe-coded”. With the former, I may use AI in many ways: some code generation, review, bouncing ideas, or whatever. But I engage in every step, review and improve the generated code, disagree with the reviews (and still contribute a good proportion of hand-written code, at least in the core business logic). In this way I retain sufficient ownership over the output to feel it is my own.
With vibe-coding, I feel exactly as you describe it.
I use ChatGPT to give me overview of some unfamiliar topics, suggest some architecture patterns, learn about common approach to solve X or refresh on some syntax. Sometimes there’s a repetitive task like applying same edit to a list of like 40 strings (e.g. surrounding them in struct init), and I found it useful to make ChatGPT do this.
Summarising diffs in openapi, highlighting bugs and patterns in logs, documents also works pretty okay.
In my domain (signal processing, high load systems, embedded development, backend in Go) it doesn’t do great for coding tasks, and I’m very opposed to giving it lead to create files, do mass edits, et cetera. I found it to fail even on recent versions of Go, imagining interfaces, not knowing changes in some library interfaces (pre-2024 changes at least). Both ChatGPT and Claude failed to create proper application for me (parsing incoming messages and drawing real time graphics), both getting stucked at some point. Application worked more or less, but with bugs and huge performance issues.
I found it useful to quickly create skeletons for scripts/tools, that I can then fill up with actual logic, or making example of how a library is used.
So there is usability for me, it replaced stackoverflow and sometimes reading actual documentation.
I own a few repositories of our system, and contribution guides I create explicitly forbid use of LLMs and agents to create PRs. I had some experience with developers submitting vibe coded PRs and I do not want to waste my time on them anymore.
I had a couple of nice moments, like claude helping me with rust (which I don't understand) and claude finding a bug in a python library I was using
Also some not so nice moments (small rust changes were OK, but with a big one claude fumbled + I couldn't really verify that it worked so I didn't merge to code to master even when it seemingly worked)
I think it really helps to break the ice so to say. You no longer feel the tension, the pain of an empty page. You ask claude to write something, and improving something is so mentally easier
Also I mostly use claude as a spell checker / linter for the projects I'm too lazy to install proper tools for that. vim + claude, what else would you need
Luckily my company pays for the subscription, speding personal money on LLMs (especially on US LLMs) would feel strange for some reason. Ideally I want to own an LLM, have it at home but I am too lazy
For asking quick questions that would normally send me to a search engine, it’s pretty helpful. It’s also decent (most of the time) and throwing together some regex.
For throw away code, I might let the agent do some stuff. For example, we needed to test timing on DNS name resolution on a large number of systems to try and track down if that was causing our intermittent failures. I let an agent write that and was able to get results faster than if I did it myself, and I ultimately didn’t have to care about the how… I just needed something to show to the network team to prove it was their problem.
For larger projects that need to plugin to the legacy code base, which I’ll need to maintain for years, I still prefer to do things myself, using AI here and there as previously mentioned to help with little things. It can also help finding bugs more quickly (no more spending hours looking for a comma).
I had an agent refactor something I was making for a larger project. It did it, and it worked, but it didn’t write it in a way that made sense to my brain. I think others on my team would have also had trouble supporting it too. It took something relatively simple and added so many layers to it that it was hard to keep all the context in my head to make simple edits or explain to someone else how it worked. I might borrow some of the ideas it had, but will ultimately write my own solution that I think will be easier for other people to read and maintain.
Borrowing some of these ideas and doing it myself also allows me to continue to learn and grow, so I have more tools in my tool belt. With the DNS thing that was totally vibe coded, there were some new things in there I hadn’t done before. While the code made sense when I skimmed through it, I didn’t learn anything from that effort. I couldn’t do anything it did again without asking AI to do it again. Long-term, I think this would be a problem.
Other people on my team have been using AI to write their docs. This has been awful. Usually they don’t write anything at all, but at least then I know they didn’t writing anything. The AI docs are simply wrong, 100% hallucinations. I have to waste time checking the doc against the code to figure that out and then go to the person that did it to make them fix it. Sometimes no doc is better than a bad doc.
I’m a newer full stack engineer, previously did mostly web dev. It’s been useful in the areas that I’m not super interested in. We’re working on a 700KLOC legacy monolithic CRUD app with 0 documentation, it’s essentially the Wild West. We’ve found it very difficult to apply AI in a meaningful way (not just code output, reviews, documentation writing, automation). For a small team with lots to do on what is essentially a “keep the lights on” we’re in an interesting place, as it feels the infrastructure / codebase isn’t set up to handle newer tools.
I use the code generation heavily in my day to day, though verification is a priority for me, as is gaining an understanding of the business logic + improving my skills as a developer. There’s a healthy balance between deploying 100% generated code and not using the tools at all.
It’s useful for research tasks, identifying areas I’ll be working in when developing a feature. However, this team has a gigantic backlog and there are TONS of things we are behind on, so it does feel like AI isn’t moving the needle for us, though it is helpful. I’d like to apply it in different areas, but my senior engineer is very anti-AI, so he doesn’t find the tools useful and is actively against using them. Like I said, there’s surely a balance…
I see us using / relying on them more in the future, due to pressure from above, along with the general usefulness of them.
Like a lot of things, it’s neither and somewhere in the middle. It’s net useful even if just for code reviews that make you think about something you may have missed. I personally also use it to assist in feature development, but it’s not allowed to write or change anything unless I approve it (and I like to look at the diff for everything)
had a new one. one of my nontechnical managers generated some ESRI Arcade (toy language) code in ms copilot then called me for 3 hours so i could help them debug it in paired programming session. were consultants (im aware. lets not unpack that now) so a nice way to score 3 hours of billable work i might have had to generate from a client. honestly wouldn't mind being the "ai debugging guy" at my office. easy and mildly entertaining billable work
funny the manager thought they could shoot from the hip like that. wonder if they think its an effective pattern
I develop prototypes using Claude Code. The dead boring stuff.
"Implement JWT token verification and role checking in Spring Boot. Secure some endpoints with Oauth2, some with API key, some public."
C# and Java are so old, whatever solutions you find are 5 years out of date. Having an agent implement and verify the foundation is the perfect fit. There's no design, just ever-chaning framework magic. I'd do the same "Google and debug" cycle, but 10 times slower.
It's kind of funny to see you saying "whatever solution you find are 5 years out of date", while at the same time saying that the tool that was taught using those same 5 years out of date solutions as a part of its training data is actually good.
Terrible idea if you ask me. I'd suggest checking the official docs next time around, or at the very least copying them into the context window.
First, good agents do that themselves. Second, specifying an exact and current version also works. Third, I'm mostly concerned about having a working example. I'm talking about breaking changes and APIs not existing in newer framework version. As long as it compiles, it's clear the approach still works.
Well then your experience is not really relevant in this thread when the prompt is specifically asking for professional coding work now, is it?
You're not an LLM (at least I don't think you are), you're not obliged to respond with an answer even when that answer is only tangentially related to the prompt.
I am required to maximise my use of AI at work and so I do. It's good enough at simple, common stuff. Throw up a web page, write some python, munge some data in C++, all great as long as the scale is small. If I'm working on anything cutting edge or niche (which I usually am) then it makes a huge mess and wastes my time. If you have a really big code base in the ~50million loc range then it makes a huge mess.
I really liked writing code, so this is all a big negative for me. I genuinely think we have built a really bad thing, that will take away jobs that people love and leave nothing but mediocrity. This thing is going to make the human race dumber and it's going to hold us back.
I work at a company that maintains one of the largest Rails codebases in the world (their claim, but believable). My experience has been the opposite - Claude and Cursor have done a wonderful job of helping me understand the implement new features in this gigantic codebase. I actually found out through AI that while I enjoy writing code, I enjoy building great software better, the coding was just a means to the end.
At my FAANG, there was a team of experienced engineers that proved they could deliver faster and more performant code than a complete org that was responsible for it earlier.
So now a lot of different parts of the company are trying to replicate their workflow. The process is showing what works, you need to have AI first documentation (readme with one line for each file to help manage context), develop skills and steering docs for your codebase, code style, etc,. And it mostly works!
For me personally, it has drastically increased productivity. I can pick up something from our infinitely huge backlog, provide some context and let the agent go ham on fixing it while i do whatever other stuff is assigned to me.
1. Generate unit tests beyond the best-case scenario. Analogy: Netflix's Chaos Monkey
2. Incremental cleanup: I also use it as a fancier upgrade of Visual Studio's Code Analysis feature and aid me in finding areas to refactor.
3. Treating the model as a corpus of prior knowledge and discussions, I can form a 'committee of agents' (Security, Reliability, UX engineer POVs) to help me view my work at a more strategic level.
My additional twist to this is to check against my organization's mission statement. That way, I hope I can help reduce mission drift that I observe was a big issue behind dysfunctional companies.
I use it mostly to explore the information space of architectural problems, but the constant "positive engagement feedback" (first line of each generation "brilliant insight") start being deeply insincere and also false by regularly claiming "this is the mathematically best solution - ready to implement ?" only that it isn't when considering it truly.
I have moved away from using an LLM now before having figured out the specifications, otherwise it's very very risky to go down a wrong rabbit hole the LLM seduced you into via its "user engagement" training.
It’s a fantastic performance booster for a lot of mundane tasks like writing and revising design docs, tests, debugging (using it like a super smart and active rubber duck), and system design discussions.
I also use it as a final check on all my manually written code before sending it for code review.
With all that said, I have this weird feeling that my ability to quickly understand and write code is no longer noticeable, nor necessary.
Everyone now ships tons of code and even if I do the same without any LLM, the default perception will be that it has been generated.
I am not depressed about it yet, but it will surely take a while to embrace the new reality in its entirety
I use Opus 4.6 with pi.dev (one agent). I give detailed instructions what to do. I essentially use it to implement as I do it manually. Small commits, every commit tested and verified. I don’t use plan mode, just let the agent code - code review is faster than reading plan. This approach works only if you make small changes. I create mental model of the code same way, as when writing it manually.
Some people on my team codes with AI without reading code. That’s mostly a mess. No mental model, lower quality. They are really proud of it though and think they are really smart. Not sure how this will turn out.
I write stuff for free. It's definitely "professional grade," and lots of people use the stuff I ship, but I don't earn anything for it.
I use AI every day, but I don't think that it is in the way that people here use it.
I use it as a "coding partner" (chat interface).
It has accelerated my work 100X. The quality seems OK. I have had to learn to step back, and let the LLM write stuff the way that it wants, but if I do that, and perform minimal changes to what it gives me, the results are great.
It's great. I'd guess 80-90% of my code is produced in Copilot CLI sessions since the beginning of the year. Copilot CLI is worse than Claude Code, but not by a huge amount. This is mostly working in established 100k+ LOC codebases in C# and TypeScript, with a couple greenfield new projects. I have to write more code by hand in the greenfield projects at their formative stage; LLMs do better following conventions in an existing codebase than being consistent in a new one.
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
It churns through boring stuff but it's like I imagine the intellectual equivalent of breaking in a wild horse at times, so capable, so fast but easy to find yourself in a pile on the floor.
I'm learning all the time and it's fun, exasperating, tremendously empowering and very definitely a new world.
-Just pasting the error and askig what's going on here.
-"How do I X in Y considering Z?"
-Single-use scripts.
-Tab (most of the time), although that doesn't seem to be Claude.
What doesn't:
-Asking it to actually code. It's not going to do the whole thing and even if, it will take shortcuts, occasionally removing legitimate parts of the application.
-Tests. Obvious cases it can handle, but once you reach a certain threshold of coverage, it starts producing nonsense.
Overall, it's amazing at pattern matching, but doesn't actually understand what it's doing. I had a coworker like this - same vibe.
Opus 4.5 max (1m tokens) and above were the tipping point for me, before that, I agree with 100% of what you said.
But even with Opus 4.6 max / GPT 5.4 high it takes time, you need to provide the right context, add skills / subagents, include tribal knowledge, have a clear workflow, just like you onboard a new developer. But once you get there, you can definitely get it to do larger and larger tasks, and you definitely get (at least the illusion) that it "understands" that it's doing.
It's not perfect, but definitely can code entire features, that pass rigorous code review (by more than one human + security scanners + several AI code reviewers that review every single line and ensure the author also understands what they wrote)
I don’t use AI to generate any code, but I have used a few tools sparingly as such:
1. Gemini as a replacement for Stack Overflow, but I always have to check the source because it sometimes gives examples that 10 or even 15+ years old, as if that’s a definitive answer. We cannot and should not trust that anything AI produces is correct.
2. Co-Pilot to assist in code snippets and suggestions, like a better Intellisense. Comes in handy for CLI tools such as docker compose, etc.
3. Co-Pilot to help comprehension of a code base. For example, to ask how a particular component works or
to search for the meaning of a term of reference to it, especially if the term is vague or known by another name.
Believe it or not, we have just recently received guidance on AI-assisted work in general, and it’s mostly “it’s ok to use AI, but always verify it”, which of course seems completely reasonable, as you should do this with any work that you wouldn’t have done yourself.
On 1. gemini (et al) is not replacing stack overflow. its just regurgitating content it ingeated from stack overflow.
while SO allowed for new answers to show up, any new nextjs bug i ask about that is not yet common place on SO, i get some allucionation telling me to use some made up code api based on the github issue discussion.
Like many others I started feeling it had legs during the past few months. Tools and models reached some level where it suddenly started living up to some of the hype.
I'm still learning how to make the most of it but my current state is one of total amazement. I can't believe how well this works now.
One game-changer has been custom agents and agent orchestration, where you let agents kick off other agents and each one is customized and keeps a memory log. This lets me make several 1000 loc features in large existing codebases without reaching context limits, and with documentation that lets me review the work with some confidence.
I have delivered several features in large legacy codebases that were implemented while I attended meetings. Agents have created greenfield dashboards, admin consoles and such from scratch that would have taken me days to do myself, during daily standups. If it turned out bad, I tweaked the request and made another attempt over lunch. Several useful tools have been made that save me hours per week but I never took the time to make myself.
For now, I love it. I do feel a bit of "mourning the craft" but love seeing things be realized in hours instead of days or weeks.
On two greenfield web apps using straightforward stuff (Preact, Go, PostgreSQL) Claude Code has been very helpful. Especially with Claude Code and Opus >= 4.5, adding an incremental feature mostly just works. One of these is sort of a weird IDE, and Opus even does OK with obscureish things like CodeMirror grammars. I literally just write a little paragraph describing what I want, have it write the code and tests, give it a quick review, and 80% of the time it’s like, great, no notes.
To be clear, this is not vibecoding. I have a strong sense of the architecture I want, and explicitly keep Claude on the desired path much like I would a junior programmer. I also insist on sensible unit and E2E test coverage with every incremental commit.
I will say that after several months of this the signalling between UI components is getting a bit spaghettilike, but that would’ve happened anyway, and I bet Claude will be good at restructuring it when I get around to that.
I also work in a giant Rails monolith with 15 years of accumulated cruft. In that area, I don’t write a whole lot, but CC Opus 4.6 is fantastic for reading the code. Like, ask “what are all the ways you can authenticate an API endpoint?” and it churns away for 5 minutes and writes a nice summary of all four that it found, what uses them, where they’re implemented, etc.
For personal projects and side company, I get to join in on some of the fun and really multiply the amount of work I can get through. I tend to like to iterate on a project or code base for awhile, thinking about it and then tearing things down and rebuilding it until I arrive at what I think is a good implementation. Claude Code has been a really great companion for this. I'd wager that we're going to see a new cohort of successful small or solo-founder companies that come around because of tools like this.
For work, I would say 60% of my company's AI usage is probably useless. Lots of churning out code and documents that generate no real value or are never used a second time. I get the sense that the often claimed "10x more productive" is not actually that, and we are creating a whole flood of problems and technical debt that we won't be able to prompt ourselves out of. The benefit I have mostly seen myself so far is freeing up time and automating tedious tasks and grunt work.
At work I mostly use claude code and chatgpt web for general queries, but cursor is probably the most popular in our company. I don’t think we are "cooked" but it definitely changes how development will be done.
I think the process of coming up with solutions will still be there but implementation is much faster now.
My observations:
1. What works for me is the usual, work iteratively on a plan then implement and review. The more constraints I put into the plan the better.
2. The biggest problem for me is LLM assuming something wrong and then having to steer it back or redoing the plan.
3. Exploring and onboarding to new codebases is much faster.
4. I don’t see the 10x speedup but I do see that now I can discard and prototype ideas quickly. For example I don’t spend 20-30 minutes writing something just to revert it if I don’t like how it looks or works.
5. Mental exhaustion when working on multiple different projects/agent sessions is real, so I tend to only have one. Having to constantly switch mental model of a problem is much more draining than the “old” way of working on a single problem. Basically the more I give in into vibing the harder it is to review and understand.
I'm lucky enough to have upper management not pressuring to use it this or that way, and I'm using mostly to assist with programming languages/frameworks I'm not familiar with. Also, test cases (these sometimes comes wrong and I need to review thoroughly), updating documentation, my rubber duck, and some other repetitive/time consuming tasks.
Sometimes, if I have a simple, self-contained bug scenario where extensive debug won't be required, I ask it to find the reason. I have a high rate of success here.
However, it will not help you with avoiding anti-patterns. If you introduce one, it will indulge instead of pointing the problem out.
I did give it a shot on full vibe-coding a library into production code, and the experience was successful; I'm using the library - https://youtu.be/wRpRFM6dpuc
I had automation setup for anything I needed for work, gen AI made me feel like I had to babysit a dumb junior developer so I lost interest
Managment uses it to make mock websites then doesn't listen when we point out flows, so nothing new there
Some in digital marketing are using it for data collection/anlysis, but it reaches wrong conclusions 50% of the time (their words) so they are slowly dropping it and using it for meneal tasks and simple automations
In design we had a trial period but has the same issue as coding: either it makes something a senior designer could have made in 2 minutes or it introduces errors that take a long time to fix, to then do it again the next prompt
we are a senior dev team, although relative small, and to me it seems like it only really works as a subsitute for junior devs... but the point of junior devs is to grow someone into a senior with the knowledge you need in the company so i don't really get the usecase overall
I just moved to a new team in my company that prides itself on being "AI-First". The work is a relatively new project that was stood up by a small team of two developers (both of whom seem pretty smart) in the last 4 months. Both acknowledged that some parts of their tech stack, they just don't at all understand (next.js frontend). The backend is a gigantic monorepo of services glued together.
The manager & a senior dev on my first day told me to "Don't try to write code yourself, you should be using AI". I got encouraged to use spec-driven development and frameworks like superpowers, gsd, etc.
I'm definitely moving faster using AI in this way, but I legitimately have no idea what the fuck I am doing. I'm making PRs I don't know shit about, I don't understand how it works because there is an emphasis on speed, so instead of ramping up in a languages / technologies I've never used, I'm just shipping a ton of code I didn't write and have no real way to vet like someone who has been working with it regularly and actually has mastered it.
This time last year, I was still using AI, but using it as a pair programming utility, where I got help learn to things I don't know, probe topics / concepts I need exposure to, and reason through problems that arose.
I can't control the direction of how these tools are going to evolve & be used, but I would love if someone could explain to me how I can continue to grow if this actually is the future of development. Because while I am faster, the hope seems to be AI / Agents / LLMs will only ever get better and I will never need to have an original thought or use crtical thinking.
I have just about 4 years of professional experience. I had about 10 - 12 months of the start of my career where I used google to learn things before LLMs became sole singular focus.
I wake up every day with existential dread of what the future looks like.
A new way of operating is forced down your throat due to expectations of how the technology will evolve. What actually happens is highly variable - on the spectrum between a huge positive and negative surprise.
The people forcing it down you do not care about the long-term ramifications.
Tools: Claude Code and various VS Code derivatives, and Cursor at work. Generally Opus 4.6 now.
I feel it made me better and other people worse.
GOOD:
I feel that I’m producing more and better code even with unfamiliar and tangled codebases. For my own side projects, it’s brought them from vague ideas to shipped.
I can even do analyses I never could otherwise. On Friday I converted my extensive unit test suite into a textual simulation of what messages it would show in many situations and caught some UX bugs that way.
Cursor’s Bugbot is genuinely helpful, though it can be irritatingly inconsistent. Sometimes on round 3 with Bugbot it suddenly notices something that was there all along. Or because I touch a few lines of a library suddenly all edge cases in that library are my fault.
NOT GOOD:
The effect on my colleagues is not good. They are not reading what they are creating. I get PRs that include custom circular dependency breakers because the LLM introduced a circular dependency, and decided that was the best solution. The ostensible developer has no idea this happened and doesn’t even know what a circular dependency breaker is.
Another colleague does an experiment to prove that something is possible and I am tasked to implement it. The experiment consists of thousands of lines of code. After I dig into it I realize the code is assuming that something magically happened and reports it’s possible.
I was reflecting on this and realized the main difference between me and my current team is that I won’t commit code I don’t understand. So I even use the LLMs to do refactors just for clarity. while sometimes my colleagues are creating 500-line methods.
Meanwhile our leaders are working on the problem of code review because they feel it’s the bottleneck. They want to make some custom tools but I suspect they are going to be vastly inferior to the tools coming from the major LLM providers. Or maybe we’ll close the loop and we won’t even be reviewing code any more.
For professional work, I like to offload some annoying bug fixes to Claude and let it figure it out. Then, perusing the changes to make sure nothing silly is being added to the codebase. Sometimes it works pretty well. Other times, for complicated things I need to step in and manually patch. Overall, I'm a lot less stressed about meeting deadlines and being productive at work. On the other hand, I'm more stressed about losing my employment due to AI hype and its effectiveness.
For my side projects, I do like to offload the tedious steps like setup, scaffolding or updating tasks to Claude. Things like weird build or compile errors that I usually would have to spend hours Googling to figure out I can get sorted in a matter of minutes. Other than that, I still like to write my own code as I enjoy doing it.
Overall, I like it as a tool to assist in my work. What I dislike is how much peddling is being done to shove AI into everything.
As crazy as this seems, it's unlocking another variation of software engineering I didn't think was accessible. Previously, super entrenched and wicked expensive systems that might have taken years of engineering effort, appear to be ripe for disruption suddenly. The era of software systems with deeply engineered connectivity seem to be on the outs...
1. Workplace, where I work on a lot of legacy code for a crusty old CRM package (Saleslogix/Infor), and a lot of SQL integration code between legacy systems (System21).
So far I've avoided using AI generated code here simply because the AI tools won't know the rules and internal functions of these sets of software, so the time wrangling them into an understanding would mitigate any benefits.
In theory where available I could probably feed a chunk of the documentation into an agent and get some kind of sensible output, but that's a lot of context to have to provide, and in some cases such documentation doesn't exist at all, so I'd have to write it all up myself - and would probably get quasi hallucinatory output as a reward for my efforts.
2. Personally where I've been working on an indie game in Unity for four years. Fairly heavy code base - uses ECS, burst, job system, etc. From what I've seen AI agents will hallucinate too much with those newer packages - they get confused about how to apply them correctly.
A lot of the code's pretty carefully tuned for performance (thousands of active NPCs in game), which is also an area I don't trust AI coding at all, given it's a conglomeration of 'average code in the wild that ended up in the training set'.
At most I sometimes use it for rubber ducking or performance. For example at one point I needed a function to calculate the point in time at which two circles would collide (for npc steering and avoidance), and it can be helpful to give you some grasp of the necessary math. But I'll generally still re-write the output by hand to tune it and make sure I fully grok it.
Also tried to use it recently to generate additional pixel art in a consistent style with the large amount of art I already have. Results fell pretty far short unfortunately - there's only a couple of pixel art based models/services out there and they're not up to snuff.
I am having the greatest time professionally with AI coding. I now have the engineering team I’ve always dreamed of. In the last 2 months I have created:
- a web-based app for a F500 client for a workflow they’ve been trying to build for 2 years; won the contract
- built an iPad app for same client for their sales teams to use
- built the engineering agent platform that I’m going to raise funding
I see a lot of people in this thread struggling with AI coding at work. I think my platform is going to save you. The existing tools don’t work anymore, we need to think differently. That said, the old engineering principles still work; heck, they work even better now.
I can't comment about the quality of the code you delivered for your client so I checked your side project. Unfortunately it looks like there is only a landing page (very nice!) but the way from a vibe-coded project to production is usually quite long.
Not wrong at all, that’s why I’m building my own platform for this. That’s also why I haven’t publicly done much on First Cut yet. I’m using my platform to actually build the product, so the intent is that I use my expertise and oversight to ensure it’s not just slop code. So most of the effort has gone into building that platform, which has made building First Cut itself slower. But I’ve actually got my platform running well-enough that now my team is able to get involved, and I can start to work on First Cut again, which means that I should be able to answer your “concern” definitively. I share it.
I’ve been a web dev for 10+ years, and my professional pivot in 2026 has been moving away from "content-first" sites to "tool-led" content products. My current stack is Astro/Next.js + Tailwind + TypeScript, with heavy Python usage for data enrichment.
What’s working:
Boilerplate & Layout Shifting: AI (specifically Claude 4.x/5) is excellent for generating Astro components and complex Tailwind layouts. What used to take 2 hours of tweaking CSS now takes 15 minutes of prompt-driven iteration.
Programmatic SEO (pSEO) Analysis: I use Python scripts to feed raw data into LLMs to generate high-volume, structured analysis (300+ words per page). For zero-weight niche sites, this has been a massive leverage point for driving organic traffic.
Logic "Vibe Checks": When building strategy engines (like simulators for complex games), I use AI to stress-test my decision-making logic. It’s not about writing the core engine—which it still struggles with for deep strategy—but about finding edge cases in my "Win Condition" algorithms.
The Challenges:
The "Fragment" Syntax Trap: In Astro specifically, I’ve hit issues where AI misidentifies <> shorthand or hallucinates attribute assignments on fragments. You still need to know the spec inside out to catch these.
Context Rot: As a project grows, the "context window" isn't the problem; it's the "logic drift." If you let the AI handle too many small refactors without manual oversight, the codebase becomes a graveyard of "almost-working" abstractions.
The Solution:
I treat AI as a junior dev who is incredibly fast but lacks a "mental model" of the project's soul. I handle the architecture and the "strategy logic," while the AI handles the implementation of UI components and repetitive data transformations.
Stack: Astro, TypeScript, Python scripts for data.
Experience: 10 years, independent/solo.
I only just started using it at work in the last month.
I am a data engineer maintaining a big data Spark cluster as well as a dozen Postgres instances - all self hosted.
I must confess it has made me extremely productive if we measure in terms of writing code. I don't even do a lot of special AGENTS.md/CLAUDE.md shenanigans, I just prompt CC, work on a plan, and then manually review the changes as it implements it.
Needless to say this process only works well because:
A) I understand my code base.
B) I have a mental structure of how I want to implement it.
Hence it is easy to keep the model and me in sync about what's happening.
For other aspects of my job I occasionally run questions by GPT/Gemini as a brainstorming partner, but it seems a lot less reliable. I only use it as a sounding board. I does not seem to make me any more effective at my job than simply reading documents or browsing github issues/stack overflow myself.
I use Gemini, and rarely ChatGPT (usually once or twice a day). I ask very narrow, pointed questions about something specific I would like an answer to. I typically will verify that the solution is good/accurate because I've been burned in the past by receiving what I'd characterize as a bad solution or "wrong" answer.
I think it's useful tool, but whenever I have a LLM attempt to develop an entire feature for me, the solution becomes to a pain to maintain (because I don't have the mental model around it or the solution has subtle issues).
Maybe people who are really deep into using AI are using Claude? Perhaps it's way better, I don't know.
What it has done is replace my Googling and asking people looking up stuff on stack over flow.
Its also good for generating small boiler plate code.
I don't use the whole agents thing and there are so many edge cases that I always need to understand & be aware of that the AI honestly think cannot capture
Solo dev, working on a native macOS app in SwiftUI. AI has been most useful for the boilerplate - repetitive view layouts, FileManager calls, figuring out AppKit bridging weirdness. It basically replaced Stack Overflow for me.
Where it breaks down is state management. The suggestions look right but introduce subtle bugs in how data flows between views. I've learned to only use it for isolated, well-scoped tasks. Anything that touches multiple components, I write myself.
I have mostly been using the Claude Sonnet models as they release each new one.
It is great for getting an overview on a pile of code that I'm not familiar with.
It has debugged some simple little problems I've had, eg, a complex regex isn't behaving so I'll give it the regex and a sample string and ask, "why isn't this matching" and it will figure out out.
I've used it only a little for writing new code. In those cases I will write the shell of a subroutine and a comment saying what the subroutine takes in and what it returns, then ask the LLM to fill in the body. Then I review it.
It has been useful for translating ancient perl scripts into something more modern, like python.
Experience level: very senior, programming for 25 years, have managed platform teams at Heroku and Segment.
Project type: new startup started Jan ‘26 at https://housecat.com. Pitch is “dev tools for non developers”
Team size: currently 2.
Stack: Go, vanilla HTML/CSS/JS, Postgres, SQLite, GCP and exe.dev.
Claude code and other coding harnesses fully replaced typing code in an IDE over the past year for me.
I’ve tried so many tools. Cursor, Claude and Codex, open source coding agents, Conductor, building my own CLIs and online dev environments. Tool churn is a challenge but it pays dividends to keep trying things as there have been major step functions in productivity and multi tasking. I value the HN community for helping me discover and cut through the space.
Multiple VMs available over with SSH with an LLM pre-configured has been the latest level up.
Coding is still hard work designing tests, steering agents, reviewing code, and splitting up PRs. I still use every bit of my experience every day and feel tired at end of day.
My non-programmer co-founder, more of a product manager and biz ops person, has challenges all the time. He generally can only write functional prototypes. We solve this by embracing the functional prototype and doing a lot of pair programming. It is much more productive than design docs or Figma wireframes.
In general the game changer is how much a couple of people can get done. We’re able to prototype ideas, build the real app, manage SOC2 infra, marketing and go to market better than ever thanks to the “willing interns” we have. I’ve done all this before and the AI helps with so much of the boilerplate and busywork.
I’m looking for beta testers and security researchers for the product, as well as a full time engineer if anyone is interested in seeing what a “greenfield” product, engineering culture and business looks like in 2026. Contact info in my profile.
Interesting premise for your product. Hope you find success!
From a dev perspective I feel your website pass a vibe more of a "OpenClaw you can trust" than "dev tool for non developers". Is that right? Or am I misreading the idea?
The OpenClaw stuff is awesome but it’s too raw for a lot of professionals and small teams. We’re trying to bring more guardrails to the concept and more of a Ruby on Rails philosophy to how it works.
I'm enjoying it. At this stage though, I just don't see much value if you don't have any prior knowledge of what you're doing. Of course you can use LLMs to get better at it but we're not yet at the point where I'd trust them to build something complex without supervision... nor is anyone suggesting that, except AI CEOs :)
I do wonder what will happen when real costs are billed. It might end up being a net positive since that will make you think more about what you prompt, and perhaps the results will be much better than lazily prompting and seeing what comes out (which seems to be a very typical case).
At $WORK, my team is relatively small (< 10 people) and a few people really invested in getting the codebase (a large Elixir application with > 3000 modules) in shape for AI-assisted development with a very comprehensive set of skills, and some additional tooling.
It works really well (using Claude Code and Opus 4.6 primarily). Incremental changes tend to be well done and mostly one-shotted provided I use plan mode first, and larger changes are achievable by careful planning with split phases.
We have skills that map to different team roles, and 5 different skills used for code review. This usually gets you 90% there before opening a PR.
Adopting the tool made me more ambitious, in the sense that it lets me try approaches I would normally discard because of gaps in my knowledge and expertise. This doesn't mean blindly offloading work, but rather isolating parts where I can confidently assess risk, and then proceed with radically different implementations guided by metrics. For example, we needed to have a way to extract redlines from PDF documents, and in a couple of days went from a prototype with embedded Python to an embedded Rust version with a robust test oracle against hundreds of document.
I don't have multiple agents running at the same time working on different worktrees, as I find that distracting. When the agent is implementing I usually still think about the problem at hand and consider other angles that end up in subsequent revisions.
Other things I've tried which work well: share an Obsidian note with the agent, and collaboratively iterate on it while working on a bug investigation.
I still write a percentage of code by hand when I need to clearly visualise the implementation in my head (e.g. if I'm working on some algo improvement), or if the agent loses its way halfway through because they're just spitballing ideas without much grounding (rare occurrence).
I find Elixir very well suited for AI-assisted development because it's a relatively small language with strong idioms.
This exactly matches our findings: if we start molding the repo to be "AI native" whatever that means, add the right tooling and still demand all engineers take full responsibility for their output, this system is a true multiplier.
I also have Copilot and Cursor bugbot reviews and run it on a Ralph wiggum loop with claude code. A few rounds overnight and the PR is perfect and ready for a final review before merging.
I do run 4 CC sessions in parallel though, but thats just one day a week. The rest of the week is spent figuring out the next set of features and fixes needed, operational things, meetings,feedback, etc.
Small team, backend. NDAs prevent widespread LLM use, but some of our engineers, junior and senior both, feel pretty confident in using Claude for "isolated" development, like generic packages or libraries that are plausibly unrelated to our work.
It's going very poorly, where the engineers are emboldened by speed and are vacating their normal code-review responsibilities. I would also say they are shirking ethical behavior by domineering other people's time, energy, and open source projects. Moreover, these forays into generic packages are largely vanity projects, an excuse to play with LLMs.
My only solution is to increase my level of code-review, which aggravates everybody involved, including me. It is not a good solution.
I could definitely perceive hardline rules being valuable surrounding LLM use (e.g. "LLM PRs must be less that n logical statements, no exceptions", is just one example rule off the top of my head), especially if the LLM can be made to stridently follow those rules, but the idea of hashing those out sounds unproductive.
Has been a game-changer for me. The following cases are where it shines:
- Figuring out the architecture of a project you just came into
- Tracing the root cause of a bug
- Quickly implementing a solution with known architecture
I figured out that above all, what makes or breaks success is context engineering. Keeping your project and session documentation in order, documenting every learning you've made along the way (with the help of AI), asking AI to compose a plan before implementing it, iterating on a plan before it looks good to you. Sometimes I spend several hours on a plan markdown document, iterating on it with AI, before pressing "Build" button and the AI doing it in 10 minutes.
Another important thing is verification harness. Tell the agent how to compile the code, run the tests - that way it's less likely to go off the rails.
Overall, since couple of month ago, I feel like I got rid of the part of programming that I liked the least - swimming in technicalities irrelevant for the overall project's objectives - while keeping what I liked the most - making the actual architectural and business decisions.
Sometimes it produces useful output. A good base of tests to start with. Or some little tool I'd never take the time to make if I had to do it myself.
On the other hand I tried to get help debugging a test failure and Claude spit out paragraph after paragraph arguing with itself going back and forth. Not only did it not help none of the intermediate explanations were useful either. It ended up being a waste of time. If I didn't know that I could have easily been sent on multiple wild goose chases.
We're using Augment Code heavily on a "full rewrite of legacy CRM with 30 years of business rules/data" Laravel project with a team size of 4. Augment kind of became impossible to avoid once we realized the new guy is outpacing the rest of us while posessing almost no knowledge of code and working fully in the business requirements domain, extracting requirements from the customer and passing them to AI, which was encoding them in tests and implementing them in code.
I'm using `auggie` which is their CLI-based agentic tool. (They also have a VS Code integration - that became too slow and hung often the more I used it.) I don't use any prompting tricks, I just kind of steer the agent to the desired outcome by chatting to it, and switch models as needed (Sonnet 4.6 for speed and execution, GPT 5.1 for comprehension and planning).
My favorite recent interaction with Augment was to have one session write a small API and its specification within the old codebase, then have another session implement the API client entirely from the specification. As I discovered edge cases I had the first agent document them in the spec and the second agent read the updated spec and adjust the implementation. That worked much, much better than the usual ad hoc back and forth directly between me and one agent and also created a concise specification that can be tracked in the repo as documentation for humans and context for future agentic work.
Far better results compared to GPT 5.4 and Opus 4.6. Not great for execution due to speed but has consistently had better comprehension of the codebase. Maybe it's a case of "holding it wrong" regarding the other models but that's been my experience.
Hating it TBH. I feel like it took away a lot of what I enjoyed about programming but its often so effective and I'm under so much pressure to be fast that I can't not use it.
So far, it's been fantastic. I can do more things for clients, much faster, than I ever dreamed would be possible when I've attempted work like this before.
I think the biggest problem with AI coding is that it simply doesn't fit well into existing enterprise structures. I couldn't imagine being able to do anything productive when I'm stuck having to rely on other teams or request access to stuff from the internet like I did in previous jobs.
I've been working on a client server unity based game the last couple of years. It's pretty bad at handling that use case. It misses tons of corner cases that span the client server divide.
When you prompt does the agent have access/visibility to all code bases/repos at once and do you prompt it to update both at the same time? That has worked well for me for client/server stuff.
At the end of the day, I'm being paid to ensure that the code deployed to production meets a particular bar of quality. Regardless of whether I'm reviewing code or writing it, If I let a commit be merged, I have to be convinced that it is a net positive to the codebase.
People having easy access to LLMs makes this job much harder. LLMs can create what looks at the surface like expert-written code, but suffers from below-the-surface issues that will reveal themselves as intermittent issues or subtle bugs after being deployed.
Inexperienced devs create huge commits full of such code, and then expect me to waste an entire day searching for such issues, which is miserable.
If the models don't improve significantly in the future, I expect that most high-stakes software teams will fire all the inexperienced devs and have super-experienced engineers work with the bots directly.
AI is great for getting stuff to work on technologies you're not familiar with. E.g. to write an Android or iOS app or an OpenGL shader, or even a Linux driver. It's also great for sysadmin work such as getting an ethernet connection up, or installing a docker container.
For main coding tasks, it is imho not suitable because you still have to read the code and I hate reading other people's code.
And also, the AI is still slow, so it is hard to stay focused on a task.
It’s working! ~200k LOC python/typescript codebase built from scratch as I’ve grown out the framework. I probably wrote 500-1000 lines of that, so ~99.5% written by Claude Code. I commit 10k-30k loc per week, code-reviewed and industrial strength quality (mainly thanks to rigid TDD)
I review every line of code but the TDD enforcement and self-reflection have now put both the process and continual improvement to said process more or less on autopilot.
It’s a software factory - I don’t build software any more, I walk around the machine with a clipboard optimizing and fixing constraints. My job is to input the specs and prompts and give the factory its best chance of producing a high quality result, then QA that for release.
I keep my operational burden minimal by using managed platforms - more info in the framework.
One caveat; I am a solo dev; my cofounder isn’t writing code. So I can’t speak to how it is to be in a team of engineers with this stuff.
Right now I enjoy the labs' cli harnesses, Claude Code, and Codex (especially for review). I do a bunch of niche stuff with Pi and OpenCode. My productivity is up. Some nuances with working with others using the same AI tools- we all end up trying to boil the ocean at first- creating a ton of verbose docs and massive PRs, but I/they end up regressing on throwing up every sort of LLM output we get. Instead, we continously refine the outputs in a consumable+trusted way.
My workday is fairly simple. I spend all day planning and reviewing.
1. For most features, unless it's small things, I will enter plan mode.
2. We will iterate on planning. I built a tool for this, and it seems that this is a fairly desired workflow, given the popularity through organic growth. https://github.com/backnotprop/plannotator
- This is a very simple tool that captures the plan through a hook (ExitPlanMode) and creates a UI for me to actually read the plan and annotate, with qol things like viewing plan diffs so I can see what the agent changed.
3. After plan's approved, we hit eventual review of implementation. I'll use AI reviewers, but I will also manually review using the same tool so that I can create annotations and iterate through a feedback loop with the agents.
4. Do a lot of this / multitasking with worktrees now.
I've been working on a thing for worktrees to work with docker-compose setups so you can run multiple localhost environments at once https://coasts.dev/. It's free and open source. In my experience it's made worktrees 10x better but would love to hear what other folks are doing about things like port conflicts and db isolation.
I am no longer in software as a day job so i am not sure of my input applys. I traded that world for opening a small brewery back in 2013. So I am a bit outdated on many modern trends but I still enjoy programming. In the last fee months using both gemeni and now movong over to claude, I have created at least 5 (and growing) small apps that have radically transformed what i am able to do at the business. I totally improved automation of my bookkeeping (~16hrs a month categorizing everything down to 3ish), created an immense amount of better reports on production, sales and predictions from a system i had already been slowly writing all these years, I created a run club rewards tracking app instead of relying on our paper method, improved upon a previously written full tv menu display system that syncs with our website and on premis tvs and now i am working on a full productive maintenance trigger system and a personal phone app to trigger each of these more easily. Its been a game changer for me. I have so many more ideas planned and each one frees up more of my waste time to create more.
I have the freedom to work with AI tools as much as I as I want and kind of lead the team in the direction I see fit.
It’s a lot of fun for exploring ideas. I’ve built things very fast that I would not have done at all otherwise. I have rewritten a huge chunk of semi-outdated docs into something useful with a couple of
Prompts in a day. Claude does all the annoying dependency update breaks the build kinds of things. And the reviews are extremely useful and a perfect combination with human review as they catch things extremely well that humans are bad at catching.
But in the production codebase changes must be made with much more consideration. Claude tends to perform well ob some tasks but for other I end up wasting time because I just don’t know up front how the feature must look so I cannot write a spec at the level of precision that claude needs and changing code manually is more efficient for this kind of discovery for me than dealing with large chunks of constantly changing code.
And then there’s the fact that claude produces things that work and do the thing described in the prompt extremely well but they are always also wring in sone way. When I let AI build a large chunk of code and actually go through the code there’s always a mess somewhere that ai review doesn’t see because it looks completely plausible but contains some horrible security issue or complete inconsistency with the rest of the codebase or, you know, that custom yaml parser nobody asked for and that you don’t want your day job to depend on.
Agreed, you often dig into what it built and find something insanely over engineered or something that doesn’t match the “style” of your existing code.
In this case that‘s actually a security vulnerability, I‘ve also seen a case where it built an api with auth but added a route where anyone could just PUT a new API key into it. Sometimes its own code review catches these, sometimes it does not.
my team is anti-AI. my code review requests are ignored, or are treated more strictly than others. it feels coordinated - i will have to push back the launch date of my project as a result.
another teammate added a length check to an input field, and his request was merged near instantly, even though it had zero unit testing. this team is incredibly cooked in the long term, i just need to ensure that i survive the short term somehow.
" this team is incredibly cooked in the long term" they're not actually.
People like you are making sunk expenditures whilst the models are evolving... they can just wait until the models get to 'steady-state' to figure out the optimal workflow. They will have lost out on far less.
it sounds like you might have wasted your team's time previously and now they don't trust the code you put up with a PR. Maybe you can do something to improve your relationship with them?
As a sidenote, I highly doubt they are cooked longterm. Using AI is not exactly skilled labor. If they want or need I'm sure they could learn patterns/workflows in like an afternoon. As things go on it will only get easier to use.
I am a developer turned (reluctantly) into management. I still keep my hands in code and work w team on a handful of projects. We use GitHub copilot on a daily basis and it has become a great tool that has improved our speed and quality. I have 20+ years experience and see it as just another tool in the toolbox. Maybe I’m naive but I don’t feel threatened by it.
At least at my company the problem is the business hasn’t caught up. We can code faster but our stakeholders can’t decide what they want us to build faster. Or test faster or grasp new modalities llms make possible.
That’s where I want to go next: not just speeding up and increasing code quality but improving business analytics and reducing the amount of meetings I have to be in to get business problems understood and solved.
Couldn’t read the entire comments but, my experience has been overwhelmingly positive so far.
I think what helps me be effective is a combination of factors: I work only in a modern, well-documented and well-architected Java codebase with over 80% test coverage.
I only use Claude Code with Opus 4.6 on High Effort.
I always, ALWAYS treat my “new job” as writing a detailed ticket for whatever it is I need to do.
I give the model access to a DB replica of my prod DB that I create manually.
I do NOT waste time with custom agents, Claude.md files or any of that stuff.
When I put ALL of the above together, the results ARE THE PROMISED LAND: I simply haven’t written a single line of code manually in the last 3 months.
I find this pretty interesting. I am curious though: Did you dislike coding? You sound genuinely excited to not be doing it anymore.
For me I have been a coder since a very young age and I am nearing the end of my career now. I still love writing code to problem solve just as much as the first day I learnt to code. The thought of something taking that task away from me doesn't fill me with glee.
A parallel for me is if I enjoyed puzzle pages and those brought me with joy and satisfaction employing my grey matter to solve, I just wouldn't find it interesting to have an agent complete the forms to me, with me simply guiding the agent to clues.
Replying once again for future reference to make my position clear: I firmly believe that one MUST experience programming on its own first. No LLMs, no crutches. One MUST feel the abstractions melting away and things clicking in the brain first.
The design becoming obvious. Being able to remove that extra if statement after clarifying requirements with a customer face to face.
A design pattern fitting a scenario like a glove, etc, etc.
You need REAL experience that only comes with time and effort. Years or decades, different businesses, different companies, etc.
But once you have crossed that chasm and that rite of passage, using LLMs becomes a true multiplier and my experience quite fun.
Using them blindly or without experience is a very different thing I can imagine.
I like problem solving and building useful things for our customers. Coding for me was always more of a “means to an end” than pure craft on its own. Obviously some standard, good and clean code pops up when you’re working in things to be extended or maintained by others, but, truth be told, ego battling in code reviews gets boring very fast and additionally, no matter how much I like experimenting with things, if I have an hypothesis, I can now validate it in 2 days instead of 1 week, which means I can validate double the hypothesis.
I am extremely excited about that! Coding in itself as the act of manually typing things? Absolutely not
I work at a large company that is contracted to build warehouses that automate the movement of goods with conveyors, retrieval systems, etc.
This is a key candidates to use AI as we have built hundreds of warehouses in the past. We have a standard product that spans over a hundred thousand lines of code to build upon. Still, we rely on copying code from previous projects if features have been implemented before. We have stopped investing in the product to migrate everything to microservices, for some reason, so this code copying is increasingly common as projects keep getting more complex.
Teams to implement warehouses are generally around eight developers. We are given a design spec to implement, which usually spans a few hundred pages.
AI has over doubled the speed at which I can write backend code. We've done the same task so many times before with previous warehouses, that we have a gold mine of patterns that AI can pick up on if we have a folder of previous projects that it can read. I also feel that the code I write is higher quality, though I have to think more about the design as previously I would realize something wouldn't work whilst writing the code. At GWT though, it's hopeless as there's almost no public GWT projects to train an AI on. It's also very helpful in tracing logs and debugging.
We use Cursor. I was able to use $1,300 tokens worth of Claude Opus 4.6 for a cost of $100 to the company. Sadly, Cursor discontinued it's legacy pricing model due to it being unsustainable, so only the non-frontier models are priced low enough to consistently use. I'm not sure what I'm going to do when this new pricing model takes affect tomorrow, I guess I will have to go back to writing code by hand or figure out how to use models like Gemini 3.1. GPT models also write decent code, but they are always so paranoid and strictly follow prompts to their own detriment. Gemini just feels unstable and inconsistent, though it does write higher quality code.
I'm not being paid any more for doubling my output, so it's not the end of the world if I have to go back to writing code by hand.
I am forced to use it. They want us to only have code written by Claude. We are forced to use spec-kit for everything so every PR has hundreds, if not thousands, of lines of markdown comitted to the repo per ticket. I basically only review code now. It changes so fast it is impossible to have a stable mental model of the application. My job is now to goto meetings, go through the motions of reviewing thousands of lines of slop per day while sending thousands of lines of slop to others. Everything I liked about the job has been stolen from me, only things I disliked or was indifferent to are left.
If this is what the industry is now… this will be my last job in it.
Curse everyone involved with creating this nightmare.
Professionally, sending our code off prem is not an option. Frankly I don’t understand why executives are okay with AI companies training LLMs on their IP. Unless they own a significant stake in the AI company I guess.
Personally, it’s been decent for generating tedious boilerplate. Though I’m not sure if reading the docs and just writing things myself would have been faster when it comes time to debug. I’m pretty fast at code editing with vim at this point. I’m also hesitant to feedback any fixes to the AI companies.
I’ve found “better google” to be a much more comfortable if not faster way to use the tools. Give me the information, I’ll build an understanding and see the big picture much better.
I'm using a code review agent which sometimes catches a critical big humans miss, so that is very useful.
Using it to get to know a code base is also very useful. A question like 'which functions touch this table' or 'describe the flow of this API endpoint' are usually answered correctly. This is a huge time saver when I need to work on a code base i'm less familiar with.
For coding, agents are fine for simple straightforward tasks, but I find the tools are very myopic: they prefer very local changes (adding new helper functions all over the place, even when such helpers already exist)
For harder problems I find agents get stuck in loops, and coming up with the right prompts and guardrails can be slower than just writing the code.
I also hates how slow and unpredictable the agents can be. At times it feels like gambling.
Will the agents actually fix my tests, or fuck up the code base? Who knows, let's check in 5 minutes.
IMO the worst thing is that juniors can now come up with large change sets, that seem good at a glance but then turn out to be fundamentally flawed, and it takes tons of time to review
Tasks where, in the past, I have thought “if I had a utility to do x it would save me y time” and I’d either start and give up or spend much longer than y on it are now super easy, create a directory, claude “create an app to do x” so simple.
I'm working on a startup, mostly writing C++, and I'm using AI more and more. In the last month I have one machine running Codex working on task while I work on a different machine.
I have to think like micro-manager, coming up with discrete (and well-defined) tasks for the AI to do, and I periodically review the code to make it cleaner/more efficient.
But I'm confident that it is saving me time. And my love for programming has not diminished. I'm still driving the architecture and writing code, but now I have a helper who makes progress in parallel.
It's going pretty well, though it took at least six months to get there. I'm helped by knowing the domain reasonably well, and working with a principal investigator who knows it well and who uses LLMs with caution. At this stage I use Claude for coding and research that does not involve sensitive matters, and local-only LLMs for coding and research that does. I've gradually developed some regular practices around careful specification, boundaries, testing, and review, and have definitely seen things go south a few times. Used cautiously, though, I can see it accelerating progress in carefully-chosen and -bounded work.
I am having a blast at work. I've been leaning hard into AI (as directed by leadership) while others are falling far far behind. I am building new production features, often solo or with one or two other engineers, at lightning speed, and being recognized across the org for it. This is an incredible opportunity for many engineers that won't last. I'm trying to make the most of it. It will be sad when software is no longer a useful pastimes for humans. I'm thinking another three years and most of us will be unemployed or our jobs will have been completely transformed into something unrecognizable a few short years ago.
I started AI-assisted coding quite a while ago with "query for code to copy and paste" approach which was slow but it dramatically shifts when the LLMs are used as agents that are just AI that have access to certain things like your project's source codes, internet and some other technical docs that refines them. You can simply instruct it to change snippet of codes by mentioning them with their actions inside the chat that feeds the agent, this is done in tools like cursor, antigravity, llmanywhere. an instruction could be limited to CRUD instructions, CRUD standing for Create, Read, Update, and Delete. an update instruction looks like "change the code that does this to do that" or more precise one "change the timeout of the request to ycombinator.com to 10". having a good memory definitely helps here but forgetting isn't the end of the development or necessity to start reading the source codes yourself to know where an instruction should target but you can ask the project's interconnected source codes (i put interconnected because it generates lots of source codes for some cases like test cases that aren't used in production but are part of the project in my experience with cursor for example) goal summary if you've forgotten the big picture of the project because you came back from a break or something. I used AI agent for my last langgraph solo project only which had python and go languages, git and cursor so take my advice with a grain of salt :)
It's fun, but testing has become more of a PITA. When I write code I test and understand each piece. With AI generated code I need to figure out how it works and why it isn't working.
I enjoy Opus on personal projects. I don’t even bother to check the code.
Go/JavaScript/Typescript/CSS works very well for me.
Swift not so much. I haven’t tried C/C++ yet. Scala was Ok.
Professionally I hardly use the tools for coding, since I’m in an architecture role and mostly write design docs and do reviews. And I write the occasional prototype.
I have started building tools to integrate copilot (Opus) better with $CORP. This way I can ask it questions across confluence and github.
Leveraging Claude for a project feels very addictive to me. I have to make a conscious effort to stop and I end up working on multiple projects at the same time.
Pretty good, we have a huge number of projects, some more modern than others. For the older legacy systems, it's been hugely useful. Not perfect, needs a bit more babysitting, but a lot easier to deal with than doing it solo. For the newer things, they can mostly be done solely by AI, so more time spent just speccing / designing the system than coding. But every week we are working out better and batter ways of working with AI, so it's an evolving process at the moment
One thing I use Claude for is diagramming system architecture stuff in LateX and it’s great, I just describe what I am visualizing and kaboom I get perfect output I can paste into overleaf.
I find it useful. It has been a big solve from a motivation perspective. Getting into bad API docs or getting started on a complex problem, it's easy to have AI go with me describing it. And the other positive is front end design. I've always hated css and it's derivatives and AI makes me now decent.
The negatives are that AI clearly loves to add code, so I do need to coach it into making nice abstractions and keeping it on track.
It’s like working with the dumbest, most arrogant intern you could imagine. It has perfect recall of the docs but no understating of them.
An example from last week:
Me: Do this.
AI: OK.
<Brings me code that looks like it accomplishes the task but after looking at it it’s accomplishing it in a monkey’s paw/spiteful genie kind of way.>
Me: Not quite, you didn’t take this into account. But I made the same mistake while learning so I can pull it back on track.
AI: OK
<It’s worse, and why are all the values hardcoded now?>
…
40 minutes go by. The simplest, smallest bit of code is almost right.
Me: Alright, abstract it into a Sass mixin.
AI: OK.
<Has no idea how to do it. It installed Sass, but with no understanding of what it’s working on so the mixin implementation looks almost random. Why is that the argument? What is it even trying to accomplish here?>
At which point I just give up and hand code the thing in 10 minutes.
I'm loving the experience and i also realize that this part of me, the one which could write code is obsolete. Completely. Utterly obsolete.
OK, so first i need to admit that i am not the best programmer, but i've been at it for 27 years.
These past months I've been working with two agents developing two things practically in parallel.
And i've experienced the fastest, most motivating development sessions i ever had.
Together with these two agents i was able to build two very complex systems that use all sorts of data gathering, then ETL into a format that can be queried and maintained, and it all ends up in some awesome web UIs.
I used Them not only to write code, but to do the design and architecture, discussed the front end, the business reqs.
And what i can say, is that it felt like a conversation with a crazy fast person who did everything i needed in seconds.
AS a tech guy i know what i want and i know how to describe it. That helped A LOT!
I know when we lost context and yes, there were stupid consequences that we had to fix.
But my impression is that many of these things i see criticized here refer to the people using it, less than to the AI and its output.
From my point of view, the output is what i wanted, only 250x faster than i ever expected.
And for the critiques targeting the AIs, after this i am sure that they will learn to fill in all those gaps.
We will not be criticizing then. By then my only possible job will be to translate somebody's business reqs for an agent to implement as i speak.
I’m transitioning from AI assisted (human in the loop) to AI driven (human on the loop) development. But my problems are pretty niche, I’m doing analytics right now where AI-driven is much more accessible. I’m in a team of three but so far I’m the only one doing the AI driven stuff. It basically means focusing on your specification since you are handing development off to the AI afterwards (and then a review of functionality/test coverage before deploying).
I've shipped full features and bug fixes without touching an IDE for anything significant.
When I need to type stuff myself it's mostly just minor flavour changes like Claude adding docstrings in a silly way or naming test functions the wrong way - stuff that I fixed in the prompt for the next time.
And yes, I read and understand the code produced before I tag anyone to review the PR. I'm not a monster =)
It allowed me to build my SaaS https://agreezy.app in 2 months (started January and launched early February). A lot of back and forth between Claude and Qwen but it's pretty polished. AI hallucinations are real so I ended up more tests than normal.
If I may "yes, and" this: spec → plan → critique → improve plan → implement plan → code review
It may sound absurd to review an implementation with the same model you used to write it, but it works extremely well. You can optionally crank the "effort" knob (if your model has one) to "max" for the code review.
> You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.
I'm just a sample size of one, but FWIW I didn't find that this noticably improved my results.
Not having to completely recreate all the LLM context neccessary to understand the literal context and the spectrum of possible solutions (which the LLM still "knows" before you clear the session) saves lots of time and tokens.
Interesting, I definitely see better results on a clean session. On a “dirty” session it’s more likely to go with “this is what we implemented, it’s good, we could improve it this way”, whereas on a clean session it’s a lot more likely to find actual issues or things that were overlooked in the implementation session.
My workflow is something very similar. I'd say one difference now is PRs actually take longer to get merged, but it's mainly because we ignore them and move onto something else while waiting for CI and reviews. It's not uncommon for a team member to have multiple PRs open for completely different features.
Context switching is less painful when you have a plan doc and chat history where you can ask why yesterday afternoon you (the human) decided to do this thing that way. Also for debugging it's very useful to be able to jump back in if any issues come up on QA/prod later. And I've actually had a few shower thoughts like that, which have allowed the implementations of some features to end up being much better than how I first envisioned it.
Odd how you add the time for the requirement analysis but none for the coding.
Then you tell us you leave 83% of the analysis —and the coding— to a code chatbot.
Are you actually more productive or are you going to find out down the line the chatbot missed some requirements and made APIs up to fill up a downstream document and now you better support them by yesterday?
In ye olden days, people doing this would scream at the junior developers. Are you going to scream at your screen?
To be honest, I didn't think too hard about it. I just fired and submitted with the time estimates in there kind of randomly.
You are clearly a naysayer. I get it. I was one too for a long time. Then I found a workflow and a model that was clearly delivering results and that's what I use now.
It's only a matter of time before it happens to everyone, even you. Once you have the aha moment where it works for you, you'll stop asking everyone whether they really know if it's better.
The LLM-based workflow above produces good code at a speed at least as fast as my previous workflow and typically many, many, many times faster with the code produced often using designs I would have never thought of before being able to bounce ideas off an LLM first. The biggest difference, besides the time obviously, is that the energy I need to spend is in very different places between the two.
Before it was thinking about what I needed to do and writing the code.
Now it's thinking about what I need to do and reviewing the code.
Well, I'm not considering using any code generation outside of helper scripts because in my case coding is a negligible part of my work. If I didn't have the LLM, I would find and modify the tool it is lifting code from using pre-LLM Google.
I know asking one of these LLMs to produce a document from my notes, resulted in me having to review this professional and plausible-looking yet subtly wrong document for more hours than it would have taken me to produce the document from scratch.
It's been great - I work on a lot of projects that are essentially prototypes, to test out different ideas. It's amazing for this - I can create web apps in a day now, which in the past I would not have been able to create at all, as I spent most of my career on the backend.
We use our own scripts around claude code to create and maintain 100s of products. We have products going back 30+ years and clients are definitely happier since AI. We are more responsive to requests for lower fees than before.
Going back and forth with an AI all day psychologically draining, as is checking its output with a fine tooth comb.
Einstein said something like: "To punish my distain for authority, God made me an authority". I feel like to push my distain for dev managers, techbro Jesus has made me a dev manager, of AI agents.
I also work at big tech. Claude code is very good and I have not written code by hand in months. These are very messy codebases as well.
I have to very much be in the loop and constantly guiding it with clarifying questions but it has made running multiple projects in parallel much easier and has handled many tedious tasks.
Exceptionally well. I’ve been using it for my side project for the last 7 months and have learned how to use it really well on a rather large codebase now. My side project has about 100k LOC and most of it is AI generated, though I do heavily review and edit.
AI-assisted research is a solid A already. If you are doing greenfield then. The horizon is only blocked by the GUI required tooling. Even then, that is a small enough obstruction for most researchers.
I'm a manager at a large consumer website. My team and I have built a harness that uses headless Claude's (running Opus) to do ticket work, respond to and fix PR comments, and fix CI test failures. Our only interaction with code is writing specs in Jira tickets (which we primarily do via local Claudes) and adding PR comments to GitHub PRs.
The speed we can move at is astounding. We're going to finish our backlog next quarter. We're conservatively planning on launching 3x as many features next quarter.
Claude is far from perfect: it's made us reassess our coding standards since code is primarily for Claude now, not for humans. So much of what we did was to make code easier for the next dev, and that just doesn't matter anymore.
When is your website going to be complete? Are you sure those features are what the users need? What happens to the team after everything is done? What happens to the site after the team is gone?
What you said about "we're all cooked" and "AI is useless" is literally me and everyone I know switching between the two on an hourly basis...
I find it the most exciting time for me as a builder, I can just get more things done.
Professionally, I'm dreading for our future, but I'm sure it will be better than I fear, worse than I hope.
From a toolset, I use the usual, Cursor (Super expensive if you go with Opus 4.6 max, but their computer use is game changing, although soon will become a commodity), Claude code (pro max plan) - is my new favorite. Trying out codex, and even copilot as it's practically free if you have enterprise GitHub. I'm going to probably move to Claude Code, I'm paying way too much for Cursor, and I don't really need tab completion anymore... once Claude Code will have a decent computer use environment, I'll probably cancel my Cursor account. Or... I'll just use my own with OpenClaw, but I'm not going to give it any work / personal access, only access to stuff that is publicly available (e.g. run sanity as a regular user). Playing with skills, subagents, agent teams, etc... it's all just markdowns and json files all the way down...
About our professional future:
I'm not going to start learning to be a plumber / electrician / A/C repair etc, and I am not going to recommend my children to do so either, but I am not sure I will push them to learn Computer Science, unless they really want to do Computer Science.
What excites me the most right now is my experiments with OpenClaw / NanoClaw, I'm just having a blast.
tl;dr most exciting yet terrifying times of my life.
I've gone back and forth on it a lot myself, but lately I've been more optimistic, for a couple of reasons.
While the final impact LLMs will have is yet to be determined (the hype cycle has to calm down, we need time to see impacts in production software, and there is inevitably going to be some kind of collapse in the market at some point), its undoubtable that it will improve overall productivity (though I think it's going to be far more nuanced then most people think). But with that productivity improvement will come a substantial increase in complexity and demand for work. We see this playout every single time some tool comes along and makes engineers in any field more productive. Those changes will also take time, but I suspect we're going to see a larger number of smaller teams working on more projects.
And ultimately, this change is coming for basically all industries. The only industries that might remain totally unaffected are ones that rely entirely on manual labor, but even then the actual business side of the business will also be impacted. At the end of the day I think it's better to be in a position to understand and (even to a small degree) influence the way things are going, instead of just being along for the ride.
If the only value someone brings is the ability to take a spec from someone else and churn out a module/component/class/whatever, they should be very very worried right now. But that doesn't describe a single software engineer I know.
My current employer is taking a long time to figure out how they think they want people to use it, meanwhile, all my side projects for personal use are going quite strong.
I work for a university. We've got a dedicated ChatGPT instance but haven't been approved to use a harness yet. Some devs did a pilot project and approval/licenses are supposedly coming soon.
It's useful. At my company we have an internal LLM that tends to be used in lieu of searching the web, to avoid unintentionally leaking information about what we are working on to third parties. This includes questions about software development, including generating of code. For various reasons we are not permitted to copy this verbatim, but can use it for guidance - much like, say, inspiration from Stack Overflow answers.
It has reignited my passion for coding by making it so I don't have to use my coding muscle as much during the day to improve our technologically boring product.
I still like using it for quick references and autocomplete, boilerplate function. It's funny that text completion with tab is now seen as totally obsolete to some folks.
5 years ago, I set out to build an open-source, interoperable marketplace. It was very ambitious because it required me to build an open source Shopify for not only e-commerce, but restaurants, gyms, hotels, etc that this marketplace could tap into. Without AI, this vision would’ve faltered but with AI, this vision is in reach. I see so many takes about AI killing open-source but I truly think it will liberate us from proprietary SaaS and marketplaces given enough time.
> If you've recently used AI tools for professional coding work, tell us about it.
POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is easy for me to fix the code than asking the model to fix it for me.
The work: Fairly straightward UI + backend work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages.
POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a few back & forths with the model, some direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further.
Challenges:
1) AI first documentation: Stories have to be written with greater detail and acceptance criteria.
2) Code reviews: copilot reviews on Github are surprisingly insightful, but waiting on human reviews is still a bottleneck.
3) AI first thinking: Some of the lead devs are still hung up on certain best practices that are not relevant in a world where the machine generates most of the code. There is a friction in the code LLM is good at and the standards expected from an experienced developer. This creates busy work at best, frustration at worst.
4) Anti-AI sentiment: There is a vocal minority who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a bit political and slack channels are getting interesting.
5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, some members struggle more than others.
6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term impact on producing developers who can code for a living.
Personally, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.
I work on an ancient codebase, C# and C++ code spanning over 3 major repos and 5 other minor ones. I'm senior engineer and tech lead of my team, but I also do a lot of actual coding and code reviews. It's a somewhat critical internal infra.
I'm intimately familiar with most of the code.
I've become somewhat addicted to using coding agents, in the sense I've felt I can finally realize a lot of fantasies about code cleanup and modernization I've had during the decade, and also fulfill user requests, without spending a lot of time writing code and debugging. During the last few months I've been spending my weekends prompting and learning the ropes. I've been using GPT 5.x and GPT 4 before that.
I've tried both giving it big cleanup tasks, and big design tasks. It was ok but mentally very exhausting, especially as it tends to stick to my original prompt which included a lot of known unknowns, even after I told it I've settled on a design decision, and then I have to go over its generated code line-by-line and verify that earlier decisions I had already rejected aren't slipping into the code again. In some instances I've had to tell it again and again that the code it's working on is greenfield and no backwards compatibility should be kept. In other instances I had to tell it that it shouldn't touch public API.
Also, a lot of things which I take for granted aren't done, such as writing detailed comments above each piece of code that is due to a design constraint or an obscure legacy reason. Even though I explicitly prompt it to do so.
Hand-holding it is a chore. It's like coaching a junior dev. This is on top of me having 4 actual real-life junior devs sending me PRs to review each week. It's mentally exhausting. At least I know it won't take offense when I'm belittling its overly complicated code and bad design decision (which I NEVER do when reviewing PRs for the actual junior devs, so in this sense I get something to throw my aggression against).
I have tried using it to make 3 big tasks in the last 5 months. I have shelved the first one (modernizing an ancient codebase written more than 20 years ago), as it still doesn't work even after spending ~week on it, and I can't spare any more time. The second one (getting another huge C# codebase to stop rebuilding the world on every compilation) seemed promising and in fact did work, but I ended up shelving it after discovering its solution broke auto-complete in Visual Studio. A MS bug, but still.
The 3rd big task is actually a user-facing one, involving a new file format, a managed reader and a backend writer. I gave it a more-or-less detailed design document. It went pretty ok, especially after I've made the jump to GPT 5.2 and now 5.4. Both of them still tended to hallunicate too much when the code size passed a certain threshold.
I don't use it for bug fixing or small features, since it requires a lot of explaining, and not worth it. Our system has a ton of legacy requirement and backwards compatibility guarantees that would take many days to specify properly.
I've become disillusioned last week. It's all for the best. Now that my addiction has lessened maybe I can have my weekends back.
I'd say 90% of our code is AI written today. Everyone in engineering (~30 people) is going full send on AI. We are still hand reviewing PRs and have quite strict standards, so this works to keep most of the AI slop out of the codebase. Early on most of that was humans being lazy and not reviewing their own PRs before opening them, but we've got better at that now. We still have a lot of legacy human slop ("tech debt") we are trying to get rid of, and the efficiency gains from AI let us spend time on that.
Stack is a monolith SaaS dashboard in Vue / Typescript on the frontend, Node.js on the backend, first built in 2019, with something like 5 different frontend state management technologies. Everyone is senior level.
We use Cursor and Opus 4.6 mainly, and are trying to figure out a more agentic process so we can work on multiple tasks in parallel. Right now we are still mainly prompting.
It’s not really that useful for what people tell me it will be useful for, most of the time. For context, I am a senior engineer that works in fintech, mostly doing backend work on APIs and payment rail microservices.
I find the most use from it as a search engine the same way I’d google “x problem stackoverflow”.
When I was first tasked with evaluating it for programming assistance, I thought it was a good “rubber duck” - but my opinion has since changed. I found that if I documented my goals and steps, using it as a rubber duck tended to lead me away from my goals rather than refine them.
Outside of my role they can be a bit more useful and generally impressive when it comes to prompting small proof of concept applications or tools.
My general take on the current state of LLMs for programming in my role is that they are like having a junior engineer that does not learn and has a severe memory disorder.
Here's my anecdote: I use ChatGPT, Gemini (web chat UI) and Claude. Claude is a bit more convenient in that it has access to my code bases, but this comes at the cost of I have to be careful I'm steering it correctly, while with the chat bots, I can feed it only the correct context.
They simplify discrete tasks. Feature additions, bug fixes, augmenting functionality.
They are incapable of creating good quality (easily expandable etc) architecture or overall design, but that's OK. I write the structs, module layout etc, and let it work on one thing at a time. In the past few days, I've had it:
- Add a ribbon/cartoon mesh creator
- Fixed a logical vs physical pixel error on devices where they were different for positioning text, and setting window size
- Fixed a bug with selecting things with the mouse under specific conditions
- Fixing the BLE advertisement payload format when integrating with a service.
- Inputting tax documents for stock sales from the PDF my broker gives me to the CSV format the tax software uses
Overall, great tool! But I think a lot of people are lying about its capabilities.
Somewhat against the common sentiment, I find it's very helpful on a large legacy project. At work, our main product is a very old, very large code base. This means it's difficult to build up a good understanding of it -- documentation is often out of date, or makes assumptions about prior knowledge. Tracking down the team or teams that can help requires being very skilled at navigating a large corporate hierarchy. But at the end of the day, the answers for how the code works is mostly in the code itself, and this is where AI assistance has really been shining for me. It can explore the code base and find and explain patterns and available methods far faster than I can.
My prompts end to be in the pattern of "I am looking to implement <X>. <Detailed description of what I expect X to do.>. Review the code base to find similar examples of how this is currently done, and propose a plan for how to implement this."
These days I'm on Claude Code, and I do that first part in Plan mode, though even a few months ago on earlier, not-as-performant models and tools, I was still finding value with this approach. It's just getting better, as the company is investing in shared skills/tools/plugins/whatever the current terminology is that is specific to various use cases within the code base.
I haven't been writing so much code directly, but I do still very much feel that this is my code. My sessions are very interactive -- I ask the agent to explain decisions, question its plans, review the produced code and often revise it. I find it frees me up to spend more time thinking through and having higher level architecture applied instead of spending frustrating hours hunting down more basic "how does this work" information.
I think it might have been an article by Simon Willison that made the case for there being a way to use AI tooling to make you smarter, or to make you dumber. Point and shoot and blindly accept output makes you dumber -- it places more distance between you and your code base. Using AI tools to automate away a lot of the toil give you energy and time to dive deeper into your code base and develop a stronger mental model of how it works -- it makes you smarter. I keep in mind that at the end of the day, it's my name on the PR, regardless of how much Claude directly created or edited the files.
Measured 12x increase in issues fixed/implemented. Solo founder business, so these are real numbers (over 2 months), not corporate fakery. And no, I am not interested in convincing you, I hope all my competitors believe that AI is junk :-)
I use it all the time now, switching between claude code, codex, and cursor. I prefer CC and codex for now but everyone is copying everyone else's homework.
I do a lot of green field research adjacent work, or work directly with messy code from our researchers. It's been excellent at building small tools from scratch, and for essentially brute forcing undocumented code. I can give it a prompt like "Here is this code we got from research, the docs are 3 months out of date and don't work, keep trying things until you manage to get $THING running".
Even for more production and engineering related tasks I'm finding it speeds up velocity. But my engineering is still closer to greenfield than a lot of people here.
I do however feel less connected to the code, even when reviewing thoroughly, I feel like I internalize things at a high level, rather than knowing every implementation detail off the dome.
The other downside is I get bigger and more frequent code review requests from colleagues. No on is just handing me straight up slop (yet...)
You guys are definitely missing out. I have the perfect army of mid-level engineers. Using codex lately, my own CPU and ram are the ones holding me back from spinning more and more agents
On my side I have used Claude code, tbh for solo projects it's good enough if you already know what you need to do.
Answering your questions:
On my job we've been spoon fed to use GH copilot everywhere we can. It's been configured to review PRs, make corrections etc. - I'd say it's good enough but from time to time it will raise false positives on issues. I'd say it works fine but you still need to keep an eye on generated BS.
I've seen coworkers show me amazing stuff done with agentic coding and I've seen coworkers open up slop PRs with bunch of garbage generated code which is kind of annoying, but I'll let it slide...
Stack - .NET, Angular, SQL Server and ofc hosted in Azure.
Team is composed of about 100 engineers (devs, QA, devops etc.) and from what I can see there are no Juniors, which is sad to see if you ask me
FAANG colleague writes this week -- "I am currently being eaten alive by AI stuff for my non-(foss-project) work. I spend most of my day slogging through AI generated comments and code trying to figure out what is good, not good, or needs my help to become good. Or I'm trying to figure out how to prompt the tools to do what I want them to do"
This fellow is one of the few mature software engineers I have ever met who is rigorously and consistently productive in a very challenging mature code base year in and year out. or WAS .. yes this is from coughgooglecough in California
I haven't actively looked into it, but on a couple of occasions after google began inserting gemini results at the top of the list, I decided to try using some of the generated code samples when then search didn't turn up anything useful. The results were a mixed bag- the libraries that I'd been searching for examples from were not very broadly used and their interfaces volatile enough that in some cases the model was returning results for obsolete versions. Not a huge deal since the canonical docs had some recommendations. In at least a couple of cases though, the results included references to functions that had never been in the library at all, even though they sounded not only plausible but would have been useful if they did in fact exist.
In the end, I am generally using the search engine to find examples because I am too lazy to look at the source for the library I'm using, but if the choice is between an LLM that fabricates stuff some percentage of the time and just reading the fucking code like I've been doing for decades, I'd rather just take my chances with the search engine. If I'm unable to understand the code I'm reading enough to make it work, it's a good signal that maybe I shouldn't be using it at all since ultimately I'm going to be on the hook to straighten things out if stuff goes sideways.
Ultimately that's what this is all about- writing code is a big part of my career but the thing that has kept me employed is being able to figure out what to do when some code that I assembled (through some combination of experimentation, documentation, or duplication) is not behaving the way I had hoped. If I don't understand my own code chances are I'll have zero intuition about why it's not working correctly, and so the idea of introducing a bunch of random shit thrown together by some service which may or may not be able to explain it to me would be a disservice to my employers who trust me on the basis of my history of being careful.
I feel bad saying this because so many folks have not had the best of luck, but it's changed the game for me.
I'm building out large multi-repo features in a 60 repo microservice system for my day job. The AI is very good at exploring all the repos and creating plans that cut across them to build the new feature or service. I've built out legacy features and also completely new web systems, and also done refactoring. Most things I make involve 6-8 repos. Everything goes through code review and QA. Code being created is not slop. High quality code and passes reviews as such. Any pushback I get goes back in to the docs and next time round those mistakes aren't made.
I did a demo of how I work in AI to the dev team at Math Academy who were complete skeptics before the call 2 hours later they were converts.
I'm repeating this 3rd time, but, a non-technical client of mine has whipped up an impressive SaaS prototype with tons of features. They still need help with the cleanup, it's all slop, but I was doing many small coding requests for that client. Those gigs will simply disappear.
I just got started using Claude very recently. I have not been in the loop how much better it got. Now it's obvious that no one will write code by hand. I genuinely fear for my ability to make a living as soon as 2 years from now, if not sooner. I figure the only way is to enter the red queen race and ship some good products. This is the positive I see. If I put 30h/week into something, I have productivity of 3 people. If it's a weekend project at 10h/week, I have now what used to be that full week of productivity. The economics of developing products solo have vastly changed for the better.
The commits are real. I'm not doing "vibe coding" or even agentic coding. I'm doing turn-by-turn where I micromanage the LLM, give specific implementation instructions, and then read and run the output before committing the code.
I'm more than happy with 2x issues closed. For my client work it means my wildly optimistic programmer estimates are almost accurate now.
I did have a frustrating period where a client was generating specs using ChatGPT. I was simply honest: "I have no idea what this nonsense means, let's meet to discuss the new requirements." That worked.
It's a game changer for reading large codebases and debugging.
Error messages were the "slop" of the pre-LLM era. This is where an LLM shines, filling in the gaps where software engineering was neglected.
As for writing code, I don't let it generate anything that I couldn't have written myself, or anything that I can't keep in my brain at once. Otherwise I get really nervous about committing.
The job of a software engineer does and always has relied upon taking responsibility for the quality of one's work. Whether it's auto-complete or a fancier auto-complete, the responsibility should rest on your shoulders.
Context: I work in robotics. We use mostly c++ and python. The entire team is about 200 though the subset I regularly interact with is maybe 50.
I basically don't use AI for coding at all. When I have tried it, it's just half working garbage and trying to describe what I want in natural language is just miserable. It feels like trying to communicate via smoke signals.
I'll be a classical engineer until they fire me and then go do something else. So far, that's working. We've had multiple rounds of large layoffs in the last year and somehow I'm still here.
I work at a unicorn in EU. Claude Code has been rolled out to all of engineering with strict cost control policies, even with these in place we burn through tens of thousands of euro per months that I think could translate in 15/20 hires easily. Are we more productive than adding people to the headcount? That's a good question that I cannot answer.
Some senior people that were in the AI pilot, have been using this for a while, and are very into it claimed that it can open PRs autonomously with minimum input or supervision (with a ton of MD files and skills in repos with clear architecture standards). I couldn't replicate this yet.
I'm objectively happy to have access to this tool, it feels like a cheat code sometimes. I can research things in the codebase so fast, or update tests and glue code so quickly that my life is objectively better. If the change is small or a simple bugfix it can truly do it autonomously quicker than me. It does make me lazier though, sometimes it's just easier to fire up claude than to focus and do it by myself.
I'm careful to not overuse it mostly to not reach the montlhy cap, so that I can "keep it" if something urgent or complex comes my way. Also I still like to do things by hand just because I still want to learn and maintain my skills. I feel that I'm not learning anything by using claude, that's a real thing.
In the end I feel it's a powerful tool that is here to stay and I would be upset if I wouldn't have access to it anymore, it's very good. I recently subscribed to it and use it on my free time just because it's a very fun technology to play with. But it's a tool. I'm paid because I take responsability that my work will be delivered on time, working, tested, with code on par with the org quality standards. If I do it by hand or with claude is irrelevant. If i can do it faster it will likely mean I will receive more work to do. Somebody still has to operate Claude and it's not going to be non-technical people for sure.
I genuinely think that if anyone still believes today that this technology is only hype or a slop machine, they are in denial or haven't tried to use a recent frontier model with the correct setup (mostly giving the agent a way to autonomously validate it's changes).
I think they need to have the enterprise plan for accessing advanced security and data handling guarantees. Also they set up pretty strict controls on what tools the agents can use at the org level that we cannot override, not sure that's an option with the subscription plans.
Not that guy, but here token billing was chosen to get the Enterprise monitoring shit. I think the C-suite is expected to report productivity increases and needs all of the data that Anthropic can scrape to justify how much money is being on fire right now.
AI is provided to my project, but to my knowledge nobody is using it. I have trouble seeing what advantages AI would provide for the work we do.
I have been doing this work long enough to know how to increase human productivity. It’s not bullshit like frameworks or AI. The secret is smaller code and faster executing applications and the kind of people who prefer simple versus easy.
I’d be more curious to hear about the processes people have put in place for AI code reviews
On the one hand, past some threshold of criticality/complexity, you can’t push AI unreviewed, on the other, you can’t relegate your senior best engineers to do nothing but review code
It doesn’t just not scale, it makes their lives miserable
So then, what’s the best approach?
I think over time that threshold I mentioned will get higher and higher, but at the moment the ratio of code that needs to be reviewed to reviewers is a little high
It is a very mixed bag. I have enjoyed using opus 4.5 and 4.6 to add functionality to existing medium complexity codebases. It’s great for green field scripts and small POCs. I absolutely cannot stand reviewing the mostly insane PRs that other people generate with it.
Very much mixed. I've used Claude to generate small changes to an existing repo, asking it to create functions or react template in the style of the rest of the file its working in, and thats worked great. I still do a lot of the fine tuning there, but if the codebase isn't one I am overly familiar with this is a good way to ensure my work doesn't conflict with the core team's.
I have also done the agentic thing and built a full CLI tool via back-and-forth engagement with Claude and that worked great - I didn't write a single line of code. Because the CLI tool was calling an API, I could ask Claude to run the requests it was generating and adjust based on the result - errors, bad requests etc, and it would fairly rapidly fix and coalesce on a working solution.
After I was done though, I reckon that if instead of this I had just done the work myself I would have had a much smaller, more reliable project. Less error handling, no unit tests, no documentation sure, but it would have worked and worked better - I wouldn't need to iterate off the API responses because I would have started with a better contract-based approach. But all of that would have been hard, would have required more 'slow thinking'. So... I didn't really draw a clean conclusion from the effort.
Continuing to experiment, not giving up on anything yet.
There's still some work to do on the rendering side of model objects. Developing the syntax highlighting rules for 40 languages and file formats in about 10 minutes was amazing to see.
Edit, great example. What is your long term maintenance strategy, do you keep the original prompts around so you can refine them later or do you dig into the source?
It's git for ETL. I haven't looked at the code, but I've been using it pretty effectively for the last week or two. I wouldn't feel comfortable recommending it to anybody else, but it was basically one-shotted. I've been dogfooding it on a number of projects, had the LLM iterate on it a bit, and I'm generally very happy with the ergonomics.
I don't have the prompt, but I used codex. I probably wrote a medium sized paragraph explaining the architecture. It scaffolded out the app, and I think I prompted it twice more with some very small bugfixes. That got me to an MVP which I used to build LaTeX pipelines. Since then, I've added a few features out as I've dogfooded it.
It's a bit challenging / frustrating to get LLMs to build out a framework/library and the app that you're using the framework in at the same time. If it hits a bug in the framework, sometimes it will rewrite the app to match the bug rather than fixing the bug. It's kind of a context balancing act, and you have to have a pretty good idea of how you're looking to improve things as you dogfood. It can be done, it takes some juggling, though.
I think LLMs are good at golang, and also good at that "lightweight utility function" class of software. If you keep things skeletal, I think you can avoid a lot of the slop feeling when you get stuck in a "MOVE THE BUTTON LEFT" loop.
I also think that dogfooding is another big key. I coded up a calculator app for a dentist office which 2-3 people use about 25 times a day. Not a lot of moving parts, it's literally just a calculator. It could basically be an excel spreadsheet, except it's a lot better UX to have an app. It wouldn't have been software I'd have written myself, really, but in about 3 total hours of vibecoding, I've had two revisions.
If you can get something to a minimal functional state without a lot of effort, and you can keep your dev/release loop extremely tight, and you use it every day, then over time you can iterate into something that's useful and good.
Overall, I'm definitely faster with LLMs. I don't know if I'm that much faster. I was probably most fluent building web apps in Django, and I was pretty dang fast with that. LLMs are more about things like "How do you build tests to prevent function drift" and "How can I scaffold a feedback loop so that the LLM can debug itself".
I think your prompts are 'the source' in a traditional sense, and the result of those prompts is almost like 'object code'. It would be great to have a higher level view of computer source code like the one you are sketching but then to distribute the prompt and the AI (toolchain...) to create the code with and the code itself as just one of many representations. This would also solve some of the copyright issues, as well as possibly some of the longer term maintainability challenges because if you need to make changes to the running system in a while then the tool that got you there may no longer be suitable unless there is a way to ingest all of the code it produced previously and then to suggest surgical strikes instead of wholesale updates.
Thank you for taking the time to write this all out, it is most enlightening. It's a fine line between 'nay sayer' and 'fanboi' and I think you've found the right balance.
On documentation, I agree with you, and have gone done the same road. I actually built out a little chat app which acts as a wrapper around the codex app which does exactly this. Unfortunately, the UI sucks pretty bad, and I never find myself using it.
I actually asked codex if it could find the chat where I created this in my logs. It turns out, I used the web interface and asked it to make a spec. Here's the link to the chat. Sorry the way I described wasn't really what happened at all! lol. https://chatgpt.com/share/69b77eae-8314-8005-99f0-db0f7d11b7...
As it happens, I actually speak-to-texted my whole prompt. And then gippity glazed me saying "This is a very good idea". And then it wrote a very, very detailed spec. As an aside, I kind of have a conspiracy theory that they deploy "okay" and "very very good" models. And they give you the good model based on if they think it will help sway public opinion. So it wrote a pretty slick piece of software and now here I am promoting the LLM. Oof da!
I didn't really mention - spec first programming is a great thing to do with LLMs. But you can go way too far with it, also. If you let the LLM run wild with the spec it will totally lose track of your project goals. The spec it created here ended up being, I think, a very good spec.
I think "code readability" is really not a solved problem, either pre or post LLM. I'm a big fan of "Code as Data" static analysis tools. I actually think that the ideal situation is less of "here is the prompt history" and something closer to Don Knuth's Literate Programming. I don't actually want to read somebody fighting context drift for an hour. I want polished text which explains in detail both what the code does and why it is structured that way. I don't know how to make the LLMs do literate programming, but now that I think about it, I've never actually tried! Hmmm....
I've done some work on compression really long ago but I am very far from an expert in the field, in fact I'm not an expert in any field ;) The best I ever did was a way to compress video better than what was available at the time but wavelets overtook that and I have not kept current.
I'm curious about two things:
- is it really that much better (if so, that would by itself be a publishable result) where better is
- not worse for other cases
- always better for the cases documented
I think that's a fair challenge.
- is it correct?
And as a sidetrack to the latter: can it be understood to the point that you can prove it is correct? Unfortunately I don't have experience with your toolchain but that's a nice learning opportunity.
Why is this the attitude when it comes to AI? Can you imagine someone saying “please provide your code” when they claim that Rust sped up their work repo or typescript reduced errors in production?
Eh, sorry, I may have been too quick to judge, but in the past when I have shared examples of AI-generated code to skeptics, the conversation rapidly devolves into personal attacks on my ability as an engineer, etc.
I think the challenge is to not be over-exuberant nor to be overly skeptical. I see AI as just another tool in the toolbox, the fact that lots of people produce crap is no different from before: lots of people produced crappy code well before AI.
But there are definitely exceptions and I think those are underexposed, we don't need 500 ways to solve toy problems we need a low number of ways to solve real ones.
Some of the replies to my comment are exactly that, they show in a much more concrete way than the next pelican-on-a-bicycle what the state of the art is really capable of and how to achieve real world results. Those posts are worth gold compared to some of the junk that gets high visibility, so my idea was to use the opportunity to highlight those instead.
FWIW, I did a full modernization and redesign of a site (~50k loc) over a week with Claude - I was able to ensure quality by (ahead of time) writing a strong e2e test suite which I also drove with AI, then ensuring Claude ran the suite every time it made changes. I got a bunch of really negative comments about it on HN (alluded to in my previous comment - everything from telling me the site looked embarrassing, didn't deserve to be on HN, saying the 600ms load time was too slow, etc, etc, etc), so I mostly withdrew from posting more about it, though I do think that the strategy of a robust e2e suite is a really good idea that can really drive AI productivity.
Yes, that e2e suite is a must for long term support and it probably would be a good idea to always create something like that up front before you even start work on the actual application.
I think that it pays off to revisit the history of the compiler. Initially compilers were positioned as a way for managers to side step the programmers, because the programmers have too much power and are hard to manage.
Writing assembly language by hand is tedious and it requires a certain mindset and the people that did this (at that time programming was still seen as an 'inferior' kind of job) were doing the best they could with very limited tools.
Enter the compiler, now everything would change. Until the mid 1980s many programmers could, when given enough time, take the output of a compiler, scan it for low hanging fruit and produce hybrids where 'inner loops' were taken and hand optimized until they made optimal use of the machine. This gave you 98% of the performance of a completely hand crafted solution, isolated the 'nasty bits' to a small section of the code and was much more manageable over the longer term.
Then, ca. 1995 or so the gap between the best compilers and the best humans started to widen, and the only areas where the humans still held the edge was in the most intricate close-to-the-metal software in for instance computer games and some extremely performant math code (FFTs for instance).
A multitude of different hardware architectures, processor variations and other dimensions made consistently maintaining an edge harder and today all but a handful of people program in high level languages, even on embedded platforms where space and cycles are still at a premium.
Enter LLMs
The whole thing seems to repeat: there are some programmers that are - quite possibly rightly so - holding on to the past. I'm probably guilty of that myself to some extent, I like programming and the idea that some two bit chunk of silicon is going to show me how it is done offends me. At the same time I'm aware of the past and have already gone through the assembly-to-high-level track and I see this as just more of the same.
Another, similar effect was seen around the introduction of the GUI.
Initially the 'low hanging fruit' of programming will fall to any new technology we introduce, boilerplate, CRUD and so on. And over time I would expect these tools to improve to the point where all aspects of computer programming are touched by them and where they either meet or exceed the output of the best of the humans. I believe we are not there yet but the pace is very high and it could easily be that within a short few years we will be in an entirely different relationship with computers than up to today.
Finally, I think we really need to see some kind of frank discussion about compensation of the code ingested by the model providers, there is something very basic that is wrong about taking the work of hundreds of thousands of programmers and then running it through a copyright laundromat at anything other than a 'cost+' model. The valuations of these companies are ridiculous and are a direct reflection of how much code they took from others.
It solves no actual problem I have. Yet introduces many new ones. Its a trap. So I don't use it and have a strong policy against using it or allowing others to use it on things I work on. BATNA is key.
More work, shorter deadlines, smaller headcount, higher expectations in terms of adaptability/transferability of people between projects (who needs knowledge transfer, ask the AI!).
But in the end, the thing that pisses me off was a manager who used it to write tickets. If the product owner doesn't give a shit about the product enough to think and write about what they want, you'll never be successful as a developer.
I use Cursor and have been pretty happy with the Plan -> Revise -> Build -> Correct flow. I don't write everything with it, but the planning step does help me clarify my thoughts at times.
One of the things that has helped the most is all the documentation I wrote inside the repository before I started using AI. It was intended for consumption by other engineers, but I think Cursor has consumed it more than any human. I've even managed to make improvements not by having AI update it, but asking AI "What unanswered questions do you have based on reading the documentation?" It has helped me fill in gaps and add clarity.
Another thing I've gotten a ton of value with is having it author diagrams. I've had it create diagrams with both the mermaid syntax and AWSDAC (Diagram-as-Code). I've always found crafting diagrams a painstaking process. I have it make a first pass by analyzing my code + configuration, then make corrections and adjustments by explaining the changes I want.
In my own PRs, I have been in the habit of posting my Cursor Plan document and Transcript so that others can learn from it. I've also encouraged other team members to do the same.
I feel bad for any teams that are being mandated to use a certain amount of AI. It seems to me that the only way to make it work is by having teams experiment with it and figure out how to best use it given their product and the team's capacity. AI is like a pair of Wile-E-Coyote rocket skates. It'll get you somewhere fast, but unless you've cleared the road of debris and pointed in exactly the right direction, you're going to careen off a cliff or into a wall.
I use it as much as I can, not because it makes me more productive, but because it's better for my career and I don't care about my idea of productivity. Whatever the company wants is probably correct in the short term.
I use Claude code for my research projects now, it’s incredible tbh. I’m not writing production code for millions of users I need to do data science stuff and write lots of code to do that and AI lets me focus on the parts of my research that I want to do and it makes me a lot more productive.
Even at its best it’s wildly inconsistent from session to session. It does things differently every time. Sometimes I get impressed with how it works, then the next day, doing the exact same thing, and it flips out and goes nuts trying to do the same thing a totally different, unworkable way.
You can capture some of these issues in AGENTS.md files or the like, but there’s an endless future supply of them. And it’s even inconsistent about how it “remembers” things. Sometimes it puts in the project local config, sometimes in my personal overall memory files, sometimes instead of using its internal systems, it asks permission to search my home directory for its memory files.
The best way to use it is for throwaway scripts or examples of how to do something. Or new, small projects where you can get away with never reading the code. For anything larger or more important, its inconsistencies make it a net time loser, imo. Sure, let it write an annoying utility function for you, but don’t just let it loose on your code.
When you do use it for new projects, make it plan out its steps in advance. Provide it with a spec full of explicit usage examples of the functionality you want. It’s very literal, so expect it to overindex on your example cases and treat those as most important. Give it a list of specific libraries or tools you want it to use. Tell it to take your spec and plan out its steps in a separate file. Then tell it to implement those steps. That usually works to allow it to build something medium-complex in an hour or two.
When your context is filling up in a session in a particular project, tell it to review its CLAUDE.md file and make sure it matches the current state of the project. This will help the next session start smoothly.
One of the saddest things I’ve found is when a whole team of colleagues gets obsessed with making Claude figure something out. Once it’s in a bad loop, you need to start over, the context is probably poisoned.
I work in the research space so it's mostly prototype code. I have unlimited access to codex 5.x-xhigh. I rarely directly alter the code codex generates at this point. My productivity has significantly increased.
It is great to solve "puzzle" problems and remove road blocks. In the past, whenever I got stuck, I often got frustrated and gave up. In so many cases AI hints to the corret solution and helps me continue. It's like a knowledgeable colleague that you can ask for help when you get stuck.
Another thing is auditing and code polishing. I asked Claude to polish a working, but still rough browser pluging, consisting of two simple Javascript files. It took ten iterations and a full day of highly intensive work to get the quality I wanted. I would say the result is good, but I could not do this process vey often without going insane. And I do not want to do this with a more complex project, yet.
So, yes, I am using it. For me it's a tool, knowledge resource, puzzle solver, code reviewer and source for inspiration. It's not a robot to write my code.
AI has helped me break through procrastination by taking care of tedious tasks that beforehand had a low ROI (boilerplate, CI/CD configs, test coverage, shell scripting/automation).
- create unit tests and benchmark tests that required lots of boiler plate , fixtures
- add CI / CD to a few projects that I didn't have motivation to
- freshen up old projects to modern standards (testing, CI / CD, update deps, migrations/deprecations)
- add monitoring / alerting to 2 projects that I had been neglecting. One was a custom DNS config uptime monitor.
- automated backup tools along with scripts for verifying recovery procedure.
- moderate migrations for deprecated APIs and refactors within cli and REST API services
- auditing GCP project resources for billing and security breaches
- frontend, backend and offline tiers for cloud storage management app
AI has definitely shifted my work in a positive way, but I am still playing the same game.
I run a small lab that does large data analytics and web products for a couple large clients. I have 5 developers who I manage directly, I write a lot of code myself and I interact directly with my clients. I have been a web developer for long enough to have written code in coldfusion, php, asp, asp.net, rails, node and javascript through microsoft frontpage exports, to jquery,to backbone, angular and react and in a lot of different frameworks. I feel this breadth of watching the internet develop in stages has given me a decent if imperfect understanding of many of the tradeoffs that can be made in developing for the web.
My work lately is on an analytics / cms / data management / gis platform that is used by a couple of our clients and we've been developing for a couple of years before any ai was used on it all. Its a react front end built on react-router-7 that can be SPA or SSR and a node api server.
I had tried AI coding a couple times over the past few years both for small toy projects and on my work and it felt to me less productive than writing code by hand until this January when I tried Claude Code with Opus 4.5. Since then I have written very few features by hand although I am often actively writing parts of them, or debugging by hand.
I am maybe in a slightly unique place in that part of my job is coming up with tasks for other developers and making sure their code integrates back, I've been doing this for 10 years plus, and personally my sucess rate with getting someone to write a new feature that will get use is maybe a bit over 50%, that is maybe generous? Figuring out what to do next in a project that will create value for users is the hard part of my job whether I am delegating to developers or to an AI and that hasn't changed.
That being said I can move through things significantly faster and more consistently using AI, and get them out to clients for testing to see if they are going to work. Its also been great for tasks which I know my developers will groan if I assign to them. In the last couple months I've been able to
- create a new version of our server that is free from years of cruft of the monorepo api we use across all our projects.
- implement sqlite compatablity for the server (in addition to original postgres support)
- Implement local first sync from scratch for the project
- Test out a large number of optimization strategies, not all of which worked out but which would have taken me so much longer and been so much more onerous the cost benefit ratio of engaging them would have been not worth it
- Tons of small features I would have assigned to someone else but are now less effort to just have the AI implement.
I think the biggest plus though is the amount of documentation that has accrued in our repo since using starting to use these tools. I find AI is pretty great at summarizing different sections of the code and with a little bit of conversation I can get it more or less exactly how I want it. This has been hugely useful to me on a number of occasions and something I would have always liked to have been doing but on a small team that is always under pressure to create results for our clients its something that didn't cross the immediate threshold of the cost benefit ratio.
In my own use of AI, I keep the bottleneck at my own understanding of the code, its important to me that I maintain a thorough understanding of the codebase. I couple possibly go faster by giving it a longer leash, but that trade off doesn't seem wise to me at this point, first because I'm already moving so much faster than I was very recently and secondly because it doesn't seem very far from the next bottleneck which is deciding what is the next useful thing to implement. For the most part, I find the AI has me moving in the right direction almost all the time but I think this is partly for me because I am already practiced in communicating the programmers what to implement next and I have a deep understanding of the code base, but also because I spend more than half of the time using AI adding context, plans and documentation to the repo.
I have encouraged my team to use these tools but I am not forcing it down anyone's throat, although its interesting to give people tasks that I am confident I could finish much quicker and much more to my personal taste than assigning it. The reactions from my team are pretty mixed, one of the strongest contributors doesn't find a lot of gains from it. One has found similar productivity gains to myself, others are very against it and hate it.
I think one of the things it will change for me is, I can no longer just create the stories for everyone, learning how to choose on what to work on is going to be the most important skill in my opinion so over the next couple months I am going to be shifting so everyone on my team has direct client interactions and I am going to try to shift away from writing stories to having meetings where I help them decide on their own what to work on. Still part of the reason that I can afford to do this is because I can now get as much or more work done than I was able to with my whole team at this time last year.
That's a big difference in one way, and I am optimistic that the platform I am working on will be a lot better and able to compete with large legacy platforms that it wouldn't have been able to compete with in the past, but still it just tightens the loop of trying new things and getting feedback and the hardest part of the business is still communication with clients and building relationships that create value.
I've been using opencode and oh-my-opencode with Claude's models (via github Copilot). The last two or three months feel like they have been the most productive of my 28-year career. It's very good indeed with Rails code, I suspect it has something to do with the intentional expressiveness of Ruby plus perhaps some above-average content that it would be trained on for this language and framework. Or maybe that's just my bias.
It takes a bit of hand holding and multiple loops to get things right sometimes, but even with that, it's pretty damn good. I don't usually walk away from it, I actively monitor what it's doing, peek in on the sub-agents, and interject when it goes down a wrong path or writes messy code. But more often than not, it goes like this:
- Point at a GH issue or briefly describe the task
- Either ask it to come up with a plan, or just go straight to implementation
- When done, run *multiple* code review loops with several dedicated code review agents - one for idiomatic Rails code, one for maintainabilty, one for security, and others as needed
These review loops are essential, they help clean up the code into something coherent most times. It really mirrors how I tend to approach tasks myself: Write something quickly that works, make it robust by adding tests, and then make it maintainable by refactoring. Just way faster.
I've been using this approach on a side project, and even though it's only nights an weekends, it's probably the most robust, well-tested and polished solo project I've ever built. All those little nice-to-have and good-to-great things that normally fall by the wayside if you only have nights and weekends - all included now.
And the funny thing is - I feel coding with AI like this gets me in the zone more than hand-coding. I suspect it's the absence of all those pesky rabbit holes that tend to be thrown up by any non-trivial code base and tool chain which can easily distract us from thinking about the problem domain and instead solving problems of our tools. Claude deals with all that almost as a side effect. So while it does its thing, I read through it's self-talk while thinking along about the task at hand, intervening if I disagree, but I stay at the higher level of abstraction, more or less. Only when the task is basically done do I dive a level deeper into code organisation, maintainability, security, edge cases, etc. etc.
Needless to say that very good test coverage is essential to this approach.
Now, I'm very ambiguous about the AI bubble, I believe very firmly that it is one, but for coding specifically, it's a paradigm shift, and I hope it's here to stay.
I'm an ex-FAANG engineer working for a smaller (but still big enough) company.
At work we use one of the less popular solutions, available both as a plugin for vscode and as a claude code-like terminal tool. The code I work on is mostly Golang and there's some older C++ using a lot of custom libraries. For Golang, the AI is doing pretty good, especially on simple tasks like implementing some REST API, so I would estimate the upper boundary of the productivity gain to be maybe 3x for the trivial code.
Since I'm still responsible for the result, I cannot just YOLO and commit the code, so whenever I get to work on simple things, I'm effectively becoming a code reviewer for the majority of time. That is what probably prevents me from going above 3x productivity; after each code review session I still need a break so I go get coffee or something, so it's still much faster than writing all the code manually, but the mental load is also higher which requires more breaks.
One nontrivial consequence is that the expectations are adapting to the new performance, so it's not like we are getting more free time because we are producing the code faster. Not at all.
For the C++ codebase though, in the rare cases when I need to change something there, it's pretty much business as usual; I won't trust the code it generates, and would rather write what I need manually.
Now, for personal projects, it's a completely different story. For the past few months or so, I haven't written any code for my personal projects manually, except for maybe a few trivial changes. I don't review the generated code either, just making sure that it works as I expect. Since I'm probably too lazy to configure the proper multi-agent workflow, what I found works great for me is: first ask Claude for the plan, then copy-paste the plan to Codex, get its feedback back to Claude, repeat until they agree; this process also helps me stay in the loop. Then, when Claude implements the plan and makes a commit, I copy-paste the commit sha to Codex and ask it to review, and it very often finds real issues that I probably would've missed.
It's hard to estimate the productivity gain of this new process mostly because the majority of the projects I worked on these past few months I would've never started without Claude. But for those I would've started, I think I'm somewhere near 4-5x compared to manually writing the code.
One important point here is that, both at work and at home, it's never a "single prompt" result. I think about the high level design and have an understanding of how things will work before I start talking to the agent. I don't think the current state of technology allows developing things in one shot, and I'm not sure this will change soon.
My overall attitude towards AI code generation is quite positive so far: I think, for me, the joy of having something working so soon, and the fact that it follows my design, outweighs the fact that I did not actually write the code.
One very real consequence of that is I'm missing my manual code writing. I started going through the older Advent of code years where I still have some unsolved days, and even solving some Leetcode problems (only interesting ones!) just for the feeling of writing the code as we all did before.
I'm not explicitly authorised to speak about this stuff by my employer but I think it's valuable to share some observations that go beyond "It's good for me" so here's a relatively unfiltered take of what I've seen so far.
Internally, we have a closed beta for what is basically a hosted Claude Code harness. It's ideal for scheduled jobs or async jobs that benefit from large amounts of context.
At a glance, it seems similar to Uber's Minion concept, although we weren't aware of that until recently. I think a lot of people have converged on the same thing.
Having scheduled roundups of things (what did I post in Slack? what did I PR in Github etc) is a nice quality of life improvement. I also have some daily tasks like "Find a subtle cloud spend that would otherwise go unnoticed", "Investigate an unresolved hotfix from one repo and provide the backstory" and "Find a CI pipeline that has been failing 10 times in a row and suggest a fix"
I work in the platform space so your mileage may vary of course. More interesting to me are the second order effects beyond my own experience:
- Hints of engineering-adjacent roles (ie; technical support) who are now empowered to try and generate large PRs implementing unscoped/ill-defined new internal services because they don't have any background to know is "good" or "bad". These sorts of types have always existed as you get people on the edge of technical-adjacent roles who aspire to become fully fledged developers without an internal support mechanism but now the barrier is a little lower.
- PR review fatigue: As a Platform Engineer, I already get tagged on acres of PRs but the velocity of PRs has increased so my inbox is still flooded with merged PRs, not that it was ever a good signal anyway.
- First hints of technical folk who progressed off the tools who might now be encouraged to fix those long standing issues that are simple in their mind but reality has shifted around a lot since. Generally LLMs are pretty good at surfacing this once they check how things are in reality but LLMs don't "know" what your mental model is when you frame a question
- Coworkers defaulting to asking LLMs about niche queries instead of asking others. There are a few queries I've seen where the answer from an LLM is fine but it lacks the historical part that makes many things make sense. As an example off the top of my head, websites often have subdomains not for any good present reason but just because back in the day, you could only have like 6 XHR connections to a domain or whatever it was. LLMs probably aren't going to surface that sort of context which takes a topic from "Was this person just a complexity lover" to "Ah, they were working around the constraints at the time".
- Obviously security is a forever battle. I think we're more security minded than most but the reality is that I don't think any of this can be 100% secure as long as it has internet access in any form, even "read only".
- A temptation to churn out side quests. When I first got started, I would tend to do work after hours but I've definitely trailed off and am back to normal now. Personally I like shipping stuff compared to programming for the sake of it but even then, I think eventually you just normalise and the new "speed" starts to feel slow again
- Privileged users generating and self-merging PRs. We have one project where most everyone has force merge and because it's internal only, we've been doing that paired with automated PR reviews. It works fairly well because we discuss most changes in person before actioning them but there are now a couple historical users who have that same permission contributing from other timezones. Waking up to a changed mental model that hasn't been discussed definitely won't scale and we're going to need to lock this down.
- Signal degradation for PRs: We have a few PRs I've seen where they provide this whole post-hoc rationalisation of what the PR does and what the problem is. You go to the source input and it's someone writing something like "X isn't working? Can you fix it?". It's really hard to infer intent and capability from PR as a result. Often the changes are even quite good but that's not a reflection of the author. To be fair, the alternative might have been that internal user just giving up and never communicating that there was an issue so I can't say this is strictly a negative.
All of the above are all things that are actively discussed internally, even if they're not immediately obvious so I think we're quite healthy in that sense. This stuff is bound to happen regardless, I'm sure most orgs will probably just paper over it or simply have no mechanism to identify it. I can only imagine what fresh hells exist in Silicon Valley where I don't think most people are equipped to be good stewarts or even consider basic ethics.
Overall, I'm not really negative or positive. There is definitely value to be found but I think there will probably be a reckoning where LLMs have temporarily given a hall pass to go faster than the support structures can keep up with. That probably looks like going from starting with a prompt for some work to moving tasks back into ticket trackers, doing pre-work to figure out the scope of the problem etc. Again, entirely different constraints and concerns with Platform BAU than product work.
Actually, I should probably rephase that a little: I'm mostly positive on pure inference while mostly negative on training costs and other societal impacts. I don't believe we'll get to everyone running Gas Town/The Wasteland nor do I think we should aspire to. I like iterating with an agent back and forth locally and I think just heavily automating stuff with no oversight is bound to fail, in the same way that large corporations get bloated and collapse under their own weight.
- Continue.dev (kind of broken regardless of models)
- Aider (unpleasant for me to use, too much busywork)
- GitHub Copilot (tbh nice plugins and generous quotas + the only working autocomplete that's actually good I've tried)
- JetBrains AI and Junie (since I already pay for their IDEs, that came bundled), also nice but quotas are quite limiting
- local models with Ollama or llama.cpp - cool conceptually but always really limited
- OpenRouter for cloud models - ended up being kind of expensive and I didn't need those various models that much in the end
- Cerebras Code - really generous token limits and amazing speed, but recently more downtime and not as stable, and I realized I need SOTA models
- OpenCode - honestly pretty good
- Codex - also pretty good
Right now:
- Anthropic Max (100 USD a month) subscription has pretty much replaced everything else
- Claude Code, both the CLI and GUI version has replaced everything else, good enough, also doesn't have *as many* file path issues as OpenCode (e.g. on Windows)
- still using Docker containers for builds, but also running it directly on my system because I'm lazy and stupid, no claws of any sort though
Overall thoughts on development:
- even the good models will create untold amounts of slop, unless controlled
- that's why I'm creating a tool called ProjectLint for my own needs, where you can write rules in ECMAScript (Go + goja) for what a project needs - stack agnostic rules in regards to the code structure, architecture, utilities that must or must not be used, file lengths and where to put them, which tbh in practice ends up being a shitload of regexes instead of ASTs, but at the same time that's good enough - there's consistent output with suggestions of what to do for each error; LLMs love that shit
- other than that, Opus 4.6 for everything currently, really nice tool use, good web search for referencing stuff (no documentation MCPs yet to keep it light), digging into node_modules or other source code to realize what's up, often MULTIPLE parallel code review agents, since just one often isn't enough
- also you really, really need code tests and the ability to stand up a local environment - I used to hate projects before that don't have these, now I just hate them with an even more of a burning passion
- I've done in a few weeks than people do in a month, not 10x but definitely an improvement with any work that has friction in it (I probably have unmedicated ADHD tbh), though the context switching will absolutely burn people out and having the ability to write code will athropy when you're just having more work thrown at you and outsource more and more of the development to these tools, plus if they hike the platform prices that's gonna be painful too
- In plain words, for a while it's gonna be great but long term we're cooked, also interesting to see that if you try to use these tools without a modicum of actual engineering in regards to how to approach these, you will often still get shit results long term, even with good models
Replying before reading anyone else's responses because I want to provide an honest response. I absolutely love it. I've spent my career as a generalist focusing on architecture and plumbing problems, mostly on Linux and embedded. Coding was something I did to get things done, now I can get things done in new languages incredibly fast, debug annoying problems rapidly, and work on new architectures very rapidly. It does a lot of the annoying research work: interpreting novel build chain failures, tracking down version-related API changes, gathering evidence of popular reports on the large plurality of Apple kernel lockdown changes that perenially break embedded work, etc. I'm working in hardware. Electronics. Physics. Mechanical. Supply chain. Software. EVERYTHING. It's a goddamn superpower. I can't get enough of it. It's like every teacher you ever wanted always available with infinite patience. I've stopped typing a lot of prompts now and started using local voice transcription. It's fantastic.
Honestly, the question may have been a bit more on the programming (generating lines) side, but I've always described programming as a lot like cleaning. You enter the room, figure out the nature of the mess (the interesting part) and come up with your strategy for solving it, then spend ages cleaning, sweeping or mopping up which is largely boring. Now you don't have to bother. Thanks, LLMs.
>"How is AI-assisted coding going for you professionally?"
For me personally - beautifully. Saves me a ton of time. Keep in mind however that I am an old fart who was originally scientist in physics, started programming with entering machine codes and designed electronics to facilitate research and after moving to Canada switching to programming completely. So I understand how everything works starting from the very bottom and am able to see good stuff from the bullshit in what I get from AI.
I however have no idea how would youngsters train their brains when they would not have any solid foundations. I think there is a danger of collapse with people having answers to all the questions but with zero ability to validate those and the AI itself degenerating into noise as a result of more and more being able to train off it's own results.
Or the AI will eventually have intelligence, motivation and agency.
For me the AI is just okay. Invaluable for personal projects but a little bit take or leave at work. It just makes too many little mistakes, i still have to babysit it, it's too much effort.
Sadly though my manager uses Claude for EVERYTHING and is completely careless with it. Hallucinations in performance reviews, hallucinations in documentation, trash tier PRs. He's so gung-ho that some of my peers are now also submitting Claude written PRs that they haven't even bothered to check whether they build correctly.
So the social aspect is very bad. I'm stuck correcting other people's slop and reading nonsense PRs a few times a week.
The biggest win for me has been cross-stack context switching. I maintain services in TypeScript, Python, and some Go, and the cost of switching between them used to be brutal - remembering idioms, library APIs, error handling patterns. Now I describe what I need and get idiomatic code in whichever language I'm in. That alone probably saves me 30-40 minutes on a typical day.
Where it consistently fails: anything involving the interaction between systems. If a bug spans a queue producer and its consumer, or the fix requires understanding how a frontend state change propagates through API calls to a cache invalidation - the model gives you a confident answer that addresses one layer and quietly ignores the rest. You end up debugging its fix instead of the original issue.
My stack: Claude Code (Opus) for investigation and bug triage in a ~60k LOC codebase, Cursor for greenfield work. Dropped autocomplete entirely after a month - it interrupted my thinking more than it helped.
I use Claude Code (CLI) daily for infrastructure work — Docker configs, shell scripts, API development, config management.
What works: I stay in the driver's seat. I own the architecture, make the decisions, validate everything. But I don't need a team to execute — Claude does the implementation. I went from being a solo dev limited by time to running a complex project (multi-agent system, Docker, Synology integration, PHP API) that would normally need 2-3 people.
The key is a good CLAUDE.md file with strict rules, and pushing Claude to think ahead and propose multiple options instead of just doing the first thing that comes to mind. Claude is also surprisingly powerful for audits — security audits, config audits, log analysis.
What doesn't work: it confidently generates plausible-looking code that's subtly wrong. Never trust it on things you can't verify. It also over-engineers everything if you don't rein it in.
The biggest shift: I went from "write code" to "review and direct code." Not sure it's making me a better engineer, but it's making me a more effective one. It extends me.
I run a small data operation - mostly background jobs processing - with a small layer of CRUD stuff on top could be Django | Rails.
I have found A.I to be good when given guard-rails e.g I can write the initial models & functions by hand - then A.I (smarter autocomplete) completes the rest.
Also good - after the initial setup - if I prompt it I no longer have to spend time reading some library docs I won't likely use
Bad - it can create bad code or go off the rails (hence you've to be a skilled operator i.e skilled engineer before)
Bad - Vibecoding doesn't work when you're dealign with external data whose APIs are mostly undocumented etc (writing types | dataclasses is useful for the smarter autocomplete)
Haven't seen this mentioned yet, but the worst part for me is that a lot of management LOVES to use Claude to generate 50 page design documents, PRDs, etc., and send them to us to "please review as soon as you can". Nobody reads it, not even the people making it. I'm watching some employees just generate endless slide decks of nonsense and then waffle when asked any specific questions. If any of that is read, it is by other peoples' Claude.
It has also enabled a few people to write code or plan out implementation details who haven't done so in a long (sometimes decade or more) time, and so I'm getting some bizarre suggestions.
Otherwise, it really does depend on what kind of code. I hand write prod code, and the only thing that AI can do is review it and point out bugs to me. But for other things, like a throwaway script to generate a bunch of data for load testing? Sure, why not.
I've been tasked with code reviews of Claude chat bot written code (not Claude code that has RAG and can browse the file system). It always lacks any understanding of our problem area, 75% of the time it only works for a specific scenario (the prompted case), and almost 100% of the time, when I comment about this, I'm told to take it over and make it work... and to use Claude.
I've kind of decided this is my last job, so when this company folds or fires me, I'm just going to retire to my cabin in the rural Louisiana woods, and my wife will be the breadwinner. I only have a few 10s of thousands left to make that home "free" (pay off the mortgage, add solar and batteries, plant more than just potatoes and tomatoes).
Though, post retirement, I will support my wife's therapy practice, and I have a goal of silly businesses that are just fun to do (until they arent), like my potato/tomato hybrid (actually just a graft) so you can make fries and ketchup from the same plant!
You got any land down there? I would like to be close to you and, post retirement, eat said french fries daily.
>so you can make fries and ketchup from the same plant!
We should be friends. I like your ideas.
For what is worth, I prefer the name pomato to totato.
I'll keep that in mind when marketing. I was going to go with French Fry Tree.
Name to consider: twoatos (pot- and tom-)
That sounds lovely. I think too many people get attached to the structure of life as they've lived it for the last n years and resist natural phase transitions for far too long. Good luck with retirement and your dream of being the botanical equivalent of the mean kid from Toy Story:p
I noticied what previously would take 30 mins, now takes a week. For example we had a performance issue with a DB, previously I'd just create a GSI (global secondary index), now there is a 37 page document with explanation, mitigation, planning, steps, reviews, risks, deployment plan, obstacles and a bunch of comments, but sure it looks cool and very professional.
Im now out of the workforce and can’t even imagine the complexity of the systems as management and everyone else communicate plans and executions through Claude. It must already be the case that some code based are massive behemoths few devs understand. Is Claude good enough to help maintain and help devs stay on top of the codebase?
The code is fine, strong reviews help and since we're slower due to all slop communication also helps.
I quit my last job because of this. I’m pretty sure manager was using free chatgpt with no regard for context length too, because not only was it verbose it was also close to gibberish. Being asked to review urgently and estimate deadlines got old real fast
If you shove clearly AI generated content at me, I will use an AI to summarize it.
Or I'll walk up to your desk and ask you to explain it.
Jump straight to the second option. You have to presume that the content they sent you has no relation whatsoever to their actual understanding of the matter.
Be prepared for "I Asked claude and it said: ..." at some point you will just ask claude via a microphone
We all use Claude at my work and I have a very strict rule for my boss and my team: we don’t say “I asked Claude”. We use it a lot, but I expect my team to own it.
I actually think there’s almost an acceptable workflow here of using LLMs as part of the medium of communication. I’m pretty much fine with someone sending me 500 lines of slop with the stated expectation that I’ll dump it into an LLM on my end and interact with it.
It’s the asymmetric expectations—that one person can spew slop but the other must go full-effort—that for me personally feels disrespectful.
I also don't mind that. Summarized information exchange feels very efficient. But for sure, it seems like a societal expectation is emerging around these tools right now - expect me to put as much effort into consuming data as you did producing it. If you shat out a bunch of data from an LLM, I'm going to use an LLM to consume that data as well. And it's not reasonable for you to expect me to manually parse that data, just as well as I wouldn't expect you to do the same.
However, since people are not going to readily reveal that they used an LLM to produce said output, it seems like the most logical way to do this is just always use an LLM to consume inputs, because there's no easy 100% way to tell whether it was created by an LLM or a human or not anymore.
Concept -> LLM fluff -> LLM summary -> Recipient
This kinda risks the broken telephone problem, or when you translate from one language to another and then again to another - context and nuance is always lost.
Just give me the bullet points, it's more efficient anyway. No need to add tons of adjectives and purple prose around it to fluff it up.
Some day someone brilliant will discover the idea of "sharing prompts" to get around this issue. So, instead of sending the clean and summarized LLM output, you'll just send your prompt, and then the recipient can read that, and in response, share their prompt back to the original sender.
A true prisoners dilemma!
I think we'll eventually move away from using these verbose documents, presentations, etc for communication. Just do your work, thinking, solving problems, etc while verbally dumping it all out into LLM sessions as you go. When someone needs to be updated on a particular task or project, there will be a way to give them granular access to those sessions as a sort of partial "brain dump" of yours. They can ask the LLM questions directly, get bullet points, whatever form they prefer the information in.
That way, thinking is communication! That's kind of why I loved math so much - it felt like I could solve a problem and succinctly communicate with the reader at the same time.
That sounds intriguing. LLM as moderator or coordinator or similar.
If you write 3 bullet points and produce 500-pages of slop why would my AI summarise it back to the original 3 bullet points and not something else entirely?
It won't, and that's the joke. They will write three bullet points, but their AI will only focus on the first two and hallucinate two more to fill out the document. Your AI will ignore them completely and go off on some unrelated tangent based on the of the earlier hallucinations. Anthropic collects a fee from both of you and is the only real winner here.
is this better than normal communication in any way, or just not much worse?
> It’s the asymmetric expectations—that one person can spew slop but the other must go full-effort—that for me personally feels disrespectful.
This has always been the case. Have some junior shit out a few thousand lines of code, leave, and leave it for the senior cleanup crew to figure out what the fuck just happened...
"send me your prompts instead"
There's a discussion going on that if you use an LLM to generate code, should the prompts (and related stuff) be a part of the pull request.
If you shove content at me that I even suspect was AI generated I will summarily hit the delete button and probably ban you from sending me any form of communication ever again.
It's a breach of trust. I don't care if you're my friend, my boss, a stranger, or my dog - it crosses a line.
I value my time and my attention. I will willingly spend it on humans, but I most certainly won't spend it on your slop when you didn't even feel me worth making a human effort.
I've found in my (admittedly limited) use of LLMs that they're great for writing code if I don't forsee a need to review it myself either, but if I'm going to be editing the code myself later I need to be the one writing it. Also LLMs are bad at design.
Master Foo and the Programming Prodigy: https://catb.org/~esr/writings/unix-koans/prodigy.html
what code do you write that you don't need to mantain/read again later?
For me it's throwaway scripts and tools. Or tools in general. But only simple tools that it can somewhat one-shot. If I ever need to tweak it, I one-shot another tool. If it works, it's fine. No need to know how it works.
If I'm feeling brave, I let it write functions with very clear and well defined input/output, like a well established algorithm. I know it can one-shot those, or they can be easily tested.
But when doing something that I know will be further developed, maintained, I mainly end up writing it by hand. I used to have the LLM write that kind of code as well, but I found it to be slower in the long run.
Definitely a lot of one-shot scripts for a given environment... I've started using a run/ directory for shell scripts that will do things like spin up a set of containers defined in a compose file.. build and test certain sub-projects, initialize a database, etc.
For the most part, many of them work the first time and just continue to do so to aid a project. I've done similar in terms of scaffolding a test/demo environment around a component that I'm directly focused on... sometimes similar for documentation site(s) for gh pages, etc.
Soem things have gone surprisingly well.
> Also LLMs are bad at design.
I've found that SoTA LLMs sometimes implement / design differently (in the sense that "why didn't I think of that"), and that's always refreshing to see. I may run the same prompt through Gemini, Sonnet, and Codex just to see if they'd come up with some technique I didn't even know to consider.
> don't forsee a need to review it myself either
On the flip side, SoTA LLMs are crazy good at code review and bug fixes. I always use "find and fix business logic errors, edge cases, and api / language misuse" prompt after every substantial commit.
Obviously you should also use Claude to consume those 50 pages. It sounds cynical, but it's not. It's practical.
What I've learned in 2 years of heavy LLM use - ChatGPT, Gemini, and Claude, is that the significance is on expressing and then refining goals and plans. The details are noise. The clear goals matter, and the plans are derived from those.
I regularly interrupt my tools to say, "Please document what you just said in ...". And I manage the document organization.
At any point I can start fresh with any AI tool and say, "read x, y, and z documents, and then let's discuss our plans". Although I find that with Gemini, despite saying, "let's discuss", it wants to go build stuff. The stop button is there for a reason.
I use an agents.md file to guide Claude, and I include a prominent line that reads UPDATE THIS FILE WITH NEW LEARNINGS. This is a bit noisy -- I have to edit what is added -- but works well and it serves as ongoing instruction. And as you have pointed out, the document serves as a great base if/when I have to switch tools.
One group of people pretends to have written something and another group of people pretends to have read something. Much productivity is gained.
Zizek had a great point about this.
At least both get paid in not-pretend money.
Similarly, managers at my workplace occasionally use LLMs to generate jira tickets (with nonsense implementation details), which has led junior engineers astray, leaving senior engineers to deal with the fallout.
Getting similar vibes from freelance clients sending me overly-articulated specs for projects, making it sound like they want sophisticated implementations. Then I ask about it and they actually want like a 30 row table written in a csv. Huge whiplash.
I instituted a simple “share the inputs” along with the outputs rule which prevents people doing exactly this. Your only value contribution is the input and filtering the output but for people with equal filtering skill, there’s no value in the output
The first point is so true. How do people expect me to work with their 20 page "deep research" document that's built by a crappy prompt and they didn't even bother to proofread.
The best thing to do is to schedule meetings with those people to go over the docs with them. Now you force them to eat their own shit and waste their own time the more output they create.
Love the intent, but isn't that wishful if you don't have any leverage? e.g., the higher up will trade you for someone who doesn't cause friction or you waste too much of your own time?
I had Claude review one. It was... not complimentary. Seemed to help a bit.
I've had this experience too. In the case of vibe code, there is at least some incentive from self-preservation that prevents things from getting too out of hand, because engineers know they will be on the hook if they allow Claude to break things. But the penalties for sloppy prose are much lower, so people put out slop tickets/designs/documentation, etc. more freely.
It makes my work suck, sadly. Team dynamics also contributes to that, admittedly.
Last year I was working on implementing a pretty big feature in our codebase, it required a lot of focus to get the business logic right and at the same time you had be very creative to make this feasible to run without hogging to much resources.
When I was nearly done and worked on catching bugs, team members grew tired of waiting and starting taking my code from x weeks ago (I have no idea why), feeding it to Claude or whatever and then came back with a solution. So instead of me finishing my code I had to go through their version of my code.
Each one of the proposals had one or more business requirements wrong and several huge bugs. Not one was any closer to a solution than mine was.
I had appreciated any contribution to my code, but thinking that it would be so easy to just take my code and finishing it by asking Claude was rather insulting.
I completely understand.
We're in a phase where founders are obsessed with productivity so everything seens to work just fine and as intended with few slops.
They're racing to be as productive as possible so we can get who knows where.
There are times when I honestly don't even know why we're automating certain tasks anymore.
In the past, we had the option of saying we didn't know something, especially when it was an area we didn't want to know about. Today, we no longer have that option, because knowledge is just a prompt away. So you end up doing front-end work for a backend application you just built, even though your role was supposed to be completely different.
This feels similar to the slow encroachment of devops onto everything. We're making so much shit nowadays that there is nobody left but developers to shepherd things into production, with all the extra responsibility and none of the extra pay commensurate with being a sysadmin too.
> Today, we no longer have that option, because knowledge is just a prompt away
Something resembling knowledge anyway. A sort of shambling mound wearing knowledge like a skinsuit
While I agree, I can't deny that AI is doing the job most of the time. But the hunt for the supreme productivity feels disgusting sometimes.
There’s a lot more going on there than AI …
Not really, this is exactly what I expect due to baseless lies from the AI companies and a disdain for employee payroll by the C-suite.
they fantasize about unpaid interns writing specs and nobody ever needed to look at the code in a few years
This seems to be a team problem more than anything? Why are your coworkers taking on your responsibilities? Where's your manager on this?
Could be an emergent team problem that wouldn’t have had cause to exist before AI.
If someone does that simply say “no. use the latest code”
That works when it's humans you can talk to. The same problem happens with AI agents though and "no, use the latest code" doesn't really help when multiple agents have each compacted their own version of what "latest" means.
I'm running Codex on a Raspberry Pi, and Claude Code CLI, Gemini CLI, and Claude in Chrome all on a Mac, all touching the same project across both machines. The drift is constant. One agent commits, the others don't know about it, and now you've got diverged realities. I'm not a coder so I can't just eyeball a diff and know which version is right.
Ended up building a mechanical state file that sits outside all the context windows. Every commit, every test run, every failed patch writes to it. When a new session starts, the agent reads that file first instead of trusting its own memory. Boring ops stuff really, but it's the only thing that actually stopped the "which version is real" problem.
Not really AI problem, more like garbage coworkers.
It has made my job an awful slog, and my personal projects move faster.
At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up. It is painful and time consuming, the code base is a mess. In one case I had to merge a feature from one team into the main code base, but the feature was AI coded so it did not obey the API design of the main project. It also included a ton of stuff you don’t need in the first pass - a ton of error checking and hand-rolled parsing, etc, that I had to spend over a week unrolling so that I could trim it down and redesign it to work in the main codebase. It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly. AI tools are not good at this kind of design deconflicting task, so while it’s easy to get the initial concept out the gate almost instantly, you can’t just magically fit it into the bigger codebase without facing the technical debt you’ve generated.
In my personal projects, I get to experience a bit of the fun I think others are having. You can very quickly build out new features, explore new ideas, etc. You have to be thoughtful about the design because the codebase can get messy and hard to build on. Often I design the APIs and then have Claude critique them and implement them.
I think the future is bleak for people in my spot professionally – not junior, but also not leading the team. I think the middle will be hollowed out and replaced with principals who set direction, coordinate, and execute. A privileged few will be hired and developed to become leaders eventually (or strike gold with their own projects), but everyone in between is in trouble.
If you dont take a stand and refuse to clean their mess, aren't you part of the problem? No self respecting proponent of AI enabled development should suggest that the engineers generating the code are still not personally responsible for its quality.
Ultimately that's only an option if you can sustain the impact to your career (not getting promoted, or getting fired). My org (publicly traded, household name, <5k employees) is all-in on AI with the goal of having 100% of our code AI generated within the next year. We have all the same successes and failures as everyone else, there's nothing special about our case, but our technical leadership is fundamentally convinced that this is both viable and necessary, and will not be told otherwise.
People who disagree at all levels of seniority have been made to leave the organization.
Practically speaking, there's no sexy pitch you can make about doing quality grunt work. I've made that mistake virtually every time I've joined a company: I make performance improvements, I stabilize CI, I improve code readability, remove compiler warnings, you name it: but if you're not shipping features, if you're not driving the income needle, you have a much more difficult time framing your value to a non-engineering audience, who ultimately sign the paychecks.
Obviously this varies wildly by organization, but it's been true everywhere I've worked to varying degrees. Some companies (and bosses) are more self-aware than others, which can help for framing the conversation (and retaining one's sanity), but at the end of the day if I'm making a stand about how bad AI quality is, but my AI-using coworker has shipped six medium sized features, I'm not winning that argument.
It doesn't help that I think non-engineers view code quality as a technical boogeyman and an internal issue to their engineering divisions. Our technical leadership's attitude towards our incidents has been "just write better code," which... Well. I don't need to explain the ridiculousness of that statement in this forum, but it undermines most people's criticism of AI. Sure, it writes crap code and misses business requirements; but in the eyes of my product team? That's just dealing with engineers in general. It's not like they can tell the difference.
Hi thanks for this brilliant feature. It will really improve the product. However it needs a little bit more work before we can merge it into our main product.
1) The new feature does not follow the existing API guidelines found here: see examples an and b.
2) The new feature does not use our existing input validation and security checking code, see example.
Once the following points have been addressed we will be happy to integrate it.
All the best.
The ball is now in their court and the feature should come back better
This is a politics problem. Engineers were sending each other crap long before AI.
..so they copy/paste your message into Claude and send you back a +2000, -1500 version 3 minutes later. And now you get to go hunting for issues again.
There is an alternative way make the necessary point here.. Let it go through with comments to the effect that you can not attest to the quality or efficacy of the code and let the organization suffer the consequences of this foray into LLM usage. If they can't use these tools responsibly and are unwilling to listen to the people who can, then they deserve to hit the inevitable quality wall Where endless passes through the AI still can't deliver working software and their token budget goes through the ceiling attempting to make it work.
I think you're falling victim to the just-world fallacy.
I am absolutely certain the world isn't just. I'm also absolutely certain the world can't get just unless you let people suffer consequences for their decisions. It's the only way people can world.
> ... I make performance improvements, I stabilize CI, I improve code readability, remove compiler warnings, you name it ...
These are exactly the kind of tasks that I ask an AI tool to perform.
Claude, Codex, et al are terrible at innovation. What they are good at is regurgitating patterns they've seen before, which often mean refactoring something into a more stable/common format. You can paste compiler warnings and errors into an agentic tool's input box and have it fix them for you, with a good chance for success.
I feel for your position within your org, but these tools are definitely shaking things up. Some tasks will be given over entirely to agentic tools.
> These are exactly the kind of tasks that I ask an AI tool to perform.
Very reasonable nowadays, but those were things I was doing back in 2018 as a junior engineer.
> Some tasks will be given over entirely to agentic tools.
Absolutely, and I've found tremendous value in using agents to clean up old techdebt with oneline prompts. They run off, make the changes, modify tests, then put up a PR. It's brilliant and has fully reshaped my approach... but in a lot of ways expectations on my efficiency are much worse now because leadership thinks I can rewrite our techstack to another language over a weekend. It almost doesn't matter that I can pass all this tidying off onto an LLM because I'm expected to have 3x the output that I did a year ago.
> My org [...] is all-in on AI with the goal of having 100% of our code AI generated within the next year.
> People who disagree at all levels of seniority have been made to leave the organization.
So either they're right (100% AI-generated code soon) and you'll be out of a job or they'll be wrong, but by then the smart people will have been gone for a while. Do you see a third future where next year you'll still have a job and the company will still have a future?
"100% AI-generated code soon" doesn't mean no humans, just that the code itself is generated by AI. Generating code is a relatively small part of software engineering. And if AI can do the whole job, then white collar work will largely be gone.
Unfortunately not many companies seem to require engineers to cycle between "feature" and "maintainability" work - hence those looking for the low-hanging fruits and know how to virtue signal seem to build their career on "features" while engineers passionate about correct solutions are left to pay for it while also labelled as "inefficient" by management. It's all a clown show, especially now with vibe-coding - no wonder we have big companies having had multiple incidents since vibing started taking off.
Culture and accountability problems aren't limited to software.
It's best to sniff out values mismatches ASAP and then decide whether you can tolerate some discomfort to achieve your personal goals.
Shipping “quality only” work for a long time can be stressful for your colleagues and the product teams.
You’re much better off mixing both (quality work and product features).
> Shipping “quality only” work for a long time can be stressful for your colleagues and the product teams.
I buried the lede a bit, but my frustration has been feeling like _nobody_ on my team prioritizes quality and instead optimizes for feature velocity, which then leaves some poor sod (me) to pick up the pieces to keep everything ticking over... but then I'm not shipping features.
At the end of the day if my value system is a mismatch from my employer's that's going to be a problem for me, it just baffles me that I keep ending up in what feels like an unsustainable situation that nobody else blinks at.
"aren't you part of the problem?"
Yes? In the same way any victim of shoddy practices is "part of the problem"?
Employees, especially ones as well leveraged and overpaid as software engineers, are not victims. They can leave. They _should_ leave. Great engineers are still able to bet better paying jobs all the time.
> Great engineers are still able to bet better paying jobs all the time
I know a lot of people who tried playing this game frequently during COVID, then found themselves stuck in a bad place when the 0% money ran out and companies weren’t eager in hiring someone whose resume had a dozen jobs in the past 6 years.
You obviously haven't gone job hunting in 2026
I hope you get the privilege soon
Employees are not victims. Sounds like a universal principle.
Came here to say this. The right solution to this is still the same as it always was - teach the juniors what good code looks like, and how to write it. Over time, they will learn to clean up the LLM’s messes on their own, improving both jobs.
> and refuse to clean their mess
You can should speak up when tasks are poorly defined, underestimated, or miscommunicated.
Try to flat out “refuse” assigned work and you’ll be swept away in the next round of layoffs, replaced by someone who knows how to communicate and behave diplomatically.
ramraj07 went on to clarify that they were advocating for putting the onus for cleanup back on mess generators.
They clearly were not advocating for flat out refusing.
Just reply with this to every AI programming task: https://simonwillison.net/2025/Dec/18/code-proven-to-work/
It's just plain unprofessional to just YOLO shit with AI and force actual humans to read to code even if the "author" hasn't read it.
Also API design etc. should be automatically checked by tooling and CI builds, and thus PR merges, should be denied until the checks pass.
> did not obey the API design of the main project
If they're handing you broken code call them out on it. Say this doesn't do what it says it does, did you want me to create a story for redoing all this work?
Thst is definitely one tell, the hand rolled input parsing or error handling that people would never have done at their own discretion. The bigger issue is that we already do the error checking and parsing at the different points of abstraction where it makes the most sense. So it's bespoke, and redundant.
That is on the people using the AI and not cleaning up/thinking about it at all.
> At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up.
This has to be the most thankless job for the near future. It's hard and you get about as much credit as the worker who cleans up the job site after the contractors are done, even though you're actually fixing structural defects.
And god forbid you introduce a regression bug cleaning up some horrible redundant spaghetti code.
Near future being the key term here imo. The entire task I mentioned was not an engineering problem, but a communication issue. The two project owners could have just talked to each other about the design, then coded it correctly in the first pass, obviating the need for the code janitor. Once orgs adapt to this new workflow, they’ll replace the code janitors with much cheaper Claude credits.
Lol you may be on to something there.. 'a code janitor'.
We’ve had this too and made a change to our code review guidelines to mention rejection if code is clearly just ai slop. We’ve let like four contractors go so far over it. Like ya they get work done fast but then when it comes to making it production ready they’re completely incapable. Last time we just merged it anyways to hit a budget it set everyone back and we’re still cleaning up the mess.
> It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly.
The hell you are playing hero for? Delegate the choice to manager: ruin the codebase or allocate two weeks for clean-up - their choice. If the magical AI team claim they can do integration faster - let them.
IME one thing that makes this choice a very difficult one is oncall responsibilities. The thing that incentivizes code owners to keep their house in order is that their oncall experience will be a lot better. And you're the only one who is incentivized to think this way. Management certainly doesn't care. So by delegating the choice to management you're signing up for a whole bunch of extra work in the form of sleepless oncall shifts.
If someone is making the kind of mistakes that cause oncall issues to increase, put that person on call. It doesn't matter if they can't do anything, call them every time they cause someone else to be paged.
IME too many don't care about on call unless they are personally affected.
> If someone is making the kind of mistakes that cause oncall issues to increase
the problem is that identifying the root cause can take a lot of time, and often the "mistakes" aren't clearly sourced down to an individual.
So someone oncall just takes the hit (ala, waking up at 3am and having to do work). That someone may or may not be the original progenitor of said mistake(s).
Framed less blamefully, that's basically the central thesis of "devops". That is the notion that owning your code in production is a good idea because then you're directly incentivized to make it good. It shouldn't be a punishment, just standard practice that if you write code you're responsible for it in production.
I think you need coding style guide files in each repo, including preferred patterns & code examples. Then you will see less and less of that.
I've heard of human engineers who are like that. "10x", but it doesn't actually work with the environment it needs to work in. But they sure got it to "feature complete" fast. The problem is, that's a long way from "actually done".
I don't use it.
I know my mind fairly well, and I know my style of laziness will result in atrophying skills. Better not to risk it.
One of my co-workers already admitted as much to me around six months ago, and that he was trying not to use AI for any code generation anymore, but it was really difficult to stop because it was so easy to reach for. Sounded kind of like a drug addiction to me. And I had the impression he only felt comfortable admitting it to me because I don't make it a secret that I don't use it.
Another co-worker did stop using it to generate code because (if I'm remembering right) he can tell what it generates is messy for long-term maintenance, even if it does work and even though he's new to React. He still uses it often for asking questions.
A third (this one a junior) seemed to get dumber over the past year, opening merge request that didn't solve the problem. In a couple of these cases my manager mentioned either seeing him use AI while they were pairing (and it looked good enough so the problems just slipped by) or saw hints in the merge request with how AI names or structures the code.
I've been using ChatGPT to teach myself all sorts of interesting fields of mathematics that I've wanted to learn but never had the time previously. I use the Pro version to pull up as many actual literature references as I can.
I don't use it at all to program despite that being my day job for exactly the reason you mentioned. I know I'll totally forget how to program. During a tight crunch period, I might use it as a quick API reference, but certainly not to generate any code. (Absolutely not saying it's not useful for this purpose—I just know myself well enough to know how this is going to go haha)
How do you get chatgpt to teach you well? I feel like no matter how dense and detailed i ask it to be or how much i ask it to elaborate and contextualize topics with their adjacent topics to give me a full holistic understanding, it just sucks at it and is always short of helping me truly understand and intuit the subject matter.
This is an interesting usecase, and I want to learn more about your workflow. Do you also use Lean etc. for math proofs.
The atrophy of manually writing code is certainly real. I'd compare it to using a paper map and a compass to navigate, versus say Google Maps. I don't particularly care to lose the skill, even though being good and enjoying the programming part of making software was my main source of income for more than a decade. I just can't escape being significantly faster with a Claude Code.
> he can tell what it generates is messy for long-term maintenance, even if it does work and even though he's new to React.
When one can generate code in such a short amount of time, logically it is not hard to maintain. You could just re-generate it if you didn't like it. I don't believe this style of argument where it's easy to generate with AI but then you cannot maintain it after. It does not hold up logically, and I have yet to see such a codebase where AI was able to generate it, but now cannot maintain it. What I have seen this year is feature-complete language and framework rewrites done by AI with these new tools. For me the unmaintainable code claim is difficult to believe.
have you tried using AI generated code in a non hobby project? one that has to go to production?
it just allucinates packages, adds random functions that already exist, creates new random APIs.
How is that not unmantainable?
We use it daily in our org. What you’re talking about is not happening. That being said, we have fairly decent mono repo structure, bunch of guides/skills to ensure it doesn’t do it that often. Also the whole plan + implement phases.
If it was July 2025, I would have agreed with you. But not anymore.
Yes, all the time. Yes, those go to production. AI has improved significantly the past 2 years, I highly recommend you give it another try.
I don't see the behaviour you describe, maybe if your impression is that of online articles or you use a local llama model or ChatGPT from 2 years ago. Claude regularly finds and resolves duplicated code in fact. Let me give you a counter-example: For adding dependencies we run an internal whitelist for AI Agents; new dependencies go through this system, we had similar concerns. I have never seen any agent used in our organisation or at a client, in the half year or so that we run the service, hallucinate a dependency.
So where does your responsibility of this code end ? Do you just push to repo, merge and that's it or do you also deploy, monitor and maintain the production systems? Who handles outages on saturday night, is it you or someone else ?
FWIW I mainly use Opus 4.6 on the $100/mo Max plan, and rarely run into these issues. They certainly occur with lower-tier models, with increased frequency the cheaper the model is - as for someone using it for a significant portion of their professional and personal work, I don’t really understand why this continues to be a widespread issue. Thoroughly vetting Plan Mode output also seems like an easy resolution to this issue, which most devs should be doing anyways IMO (e.g. `npm install random-auth-package`).
We use it for 100s of projects and what you say hasn't happened for a while.
I used to experience those issues a lot. I haven't in a while. Between having good documentation in my projects, well-defined skills for normal things, simple to use testing tools, and giving it clear requirements things go pretty smoothly.
I'd say it still really depends on what you're doing. Are you working in a poorly documented language that few people use solving problems few people have solved? Are you adding yet another normal-ish kind of feature in a super common language and libraries? One will have a lot more pain than the other, especially if you're not supplying your own docs and testing tools.
There's also just a difference of what to include in the context. I had three different projects which were tightly coupled. AI agents had a hard time keeping things straight as APIs changed between them, constantly misnaming them and getting parameters wrong and what not. Combining them and having one agent work all three repos with a shared set of documentation made it no longer make mistakes when it needed to make changes across multiple projects.
LLMs rarely if ever proactively identify cleanup refactors that reduce the complexity of a codebase. They do, however, still happily duplicate logic or large blocks of markup, defer imports rather than fixing dependency cycles, introduce new abstractions for minimal logic, and freely accumulate a plethora of little papercuts and speed bumps.
These same LLMs will then get lost in the intricacies of the maze they created on subsequent tasks, until they are unable to make forward progress without introducing regressions.
You can at this point ask the LLM to rewrite the rat’s nest, and it will likely produce new code that is slightly less horrible but introduces its own crop of new bugs.
All of this is avoidable, if you take the wheel and steer the thing a little. But all the evidence I’ve seen is that it’s not ready for full automation, unless your user base has a high tolerance for bugs.
I understand Anthropic builds Claude Code without looking at the code. And I encounter new bugs, some of them quite obvious and bad, every single day. A Claude process starts at 200MB of RAM and grows from there, for a CLI tool that is just a bundle of file tools glued to a wrapper around an API!
I think they have a rats nest over there, but they’re the only game in town so I have to live with this nonsense.
I’m the same way. But I took a bite and now I’m hooked.
I started using it for things I hate, ended up using it everywhere. I move 5x faster. I follow along most of the time. Twice a week I realize I’ve lost the thread. Once a month it sets me back a week or more.
I repeatedly tried to use LLMs for code but god they suck. I've tried most tools and models and for me it's still way faster to write things by hand.
I'm a magical tool, it's almost like if I knew what I wanted to do ! Don't have to spend time explaining and correcting.
Also, a good part of the value of me writing code is that I know the code well and can fix things quickly. In addition, I've come to realize that while I'm coding, I'm mostly thinking about the project's code architecture and technical future. It's not something I'll ever want to delegate I think.
I use AI to discuss and possibly generate ideas and tests, but I make sure I understand everything and type it in except for trivial stuff. The main value of an engineer is understanding things. AI can help me understand things better and faster. If I just setup plans for AI and vibe, human capital is neglected and declines. I don't think there's much of a future if you don't know what you're doing, but there is always a future for people with deep understanding of problems and systems.
I think you are right, deep understanding of systems and domains will not become obsolete. I forsee some types of developers moving into a more holistic systems design and implementation role if coding itself becomes quite routinely automated.
I'm an engineer at Amazon - we use Kiro (our own harness) with Opus 4.6 underneath.
Most of my gripes are with the harness, CC is way better.
In terms of productivity I'm def 2-4X more productive at work, >10x more productive on my side business. I used to work overtime to deliver my features. Now I work 9-5 and am job hunting on the side while delivering relatively more features.
I think a lot of people are missing that AI is not just good for writing code. It's good for data analysis and all sorts of other tasks like debugging and deploying. I regularly use it to manage deployment loops (ex. make a code change and then deploy the changes to gamma and verify they work by making a sample request and verifying output from cloudwatch logs etc). I have built features in 2 weeks that would take me a month just because I'd have to learn some nitty technical details that I'd never use again in my life.
For data analysis I have an internal glue catalog, I can just tell it to query data and write a script that analyzes X for me.
AI and agents particularly have been a huge boon for me. I'm really scared about automation but also it doesn't make sense to me that SWE would be automated first before other careers since SWE itself is necessary to automate others. I think there are some fundamental limitations on LLMs (without understanding the details too much), but whatever level of intelligence we've currently unlocked is fundamentally going to change the world and is already changing how SWE looks.
I saw somewhere that you guys had All Hands where juniors were prohibited from pushing AI-assisted code due to some reliability thing going on? Was that just a hoax?
https://www.aboutamazon.com/news/company-news/amazon-outage-...
About All Hands :
> Much of the coverage of the service incidents has focused on a weekly Amazon Stores operations meeting and a planned discussion of recent outages. Reviewing operational incidents is a routine part of these meetings, during which teams discuss root causes with the goal of continuing to improve reliability for customers.
This is something that's a part of every FAANG afaik. I know for a fact that there's no prohibition on pushing AI-assisted code. How would that even technically work? It'd basically mean banning Kiro/CC from the company.
> Only one of the incidents involved AI-assisted tooling, which related to an engineer following inaccurate advice that an AI tool inferred from an outdated internal wiki, and none involved AI-written code.
and this doesn't seem as "AI caused outage" as it was portrayed.
“outdated internal wiki” has to be responsible for so many AMZN outages…
Shows how although AI is great, good ol' issues that we had in human-coding times are still persistent and problematic even during the AI-age.
Not a hoax, saw it in the news. I'm not at Amazon but can confirm massive productivity gains. The issue is reviewing code. With output similar to a firehose of PR's we need to be more careful and mindful with PR's. Don't vibe code a massive PR and slap it on your coworkers and expect a review. The same PR etiquette exist today as it did years ago.
> make a code change and then deploy the changes to gamma and verify they work by making a sample request and verifying output from cloudwatch logs etc
This has been a godsend over the past week while deploying a couple services. One is a bridge between Linear and our Coder.com installation so folks can assign the work to an agent. Claude Code can do most of the work while I sleep since it has access to kubectl, Linear MCP, and Coder MCP. I no longer have to manually build, deploy, test, repeat. It just does it all for me!
How do you deal with a risk of LLM generating malicious code and then running it? I suspect it's a bit more difficult to set it up tailorer to your needs in a big corp.
> I have built features in 2 weeks that would take me a month just because I'd have to learn some nitty technical details that I'd never use again in my life.
In the bucket of "really great things I love about AI", that would definitely be at the top. So often in my software engineering career I'd have to spend tons of time learning and understanding some new technology, some new language, some esoteric library, some cobbled-together build harness, etc., and I always found it pretty discouraging when I knew that I'd never have reason to use that tech outside the particular codebase I was working on at that time. And far from being rare, I found that working in a fairly large company that that was a pretty frequent occurrence. E.g. I'd look at a design doc or feature request and think to myself "oh, that's pretty easy and straightforward", only to go into the codebase and see the original developer/team decided on some extremely niche transaction handling library or whatever (or worse, homegrown with no tests...), and trying to figure out that esoteric tech turned into 85% of the actual work. AI doesn't reduce that to 0, but I've found it has been a huge boon to understanding new tech and especially for getting my dev environment and build set up well, much faster than I could do manually.
Of course, AI makes it a lot easier to generate exponentially more poorly architected slop, so not sure if in a year or two from now I'll just be ever more dependent on AI explaining to me the mountains of AI slop created in the first place.
It’s too bad, really. While it’s easy to get discouraged about such things, over the course of my career all that learning of “pointless” tech has made me a much better programmer, designer, architect, and troubleshooter. The only way you build intuition about systems is learning them deeply.
Mind my asking why job hunting and what you wish you could do in your day job that you're not?
"I'm an engineer at Amazon"
Sanctioned comment?
> 10x more productive on my side business
Pretty sure the answer is here :)
Quite. On the face of it: possible career faux pas.
I own (with two other folk) my own little company and hire other people. I actively encourage my troops to have a bash but I suspect that a firm like AMZ would have differing views about what used to be called moonlighting. Mind you we only turnover a bit over £1M and that is loose change down the back of a sofa for AMZ ...
I work at a FAANG.
Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find: such is this team's code that does X?
I am yet to successfully prompt it and get a working commit.
Further, I will add that I also don't know any ICs personally who have successfully used it. Though, there's endless posts of people talking about how they're now 10x more productive, and everyone needs to do x y an z now. I just don't know any of these people.
Non-professionally, it's amazing how well it does on a small greenfield task, and I have seen that 10x improvement in velocity. But, at work, close to 0 so far.
Of the posts I've seen at work, they typically tend to be teams doing something new / greenfield-ish or a refactor. So I'm not surprised by their results.
This is wild. I’m on the other end.
I’ve probably prompted 10,000 lines of working code in the last two months. I started with terraform which I know backwards and forwards. Works perfectly 95% of the time and I know where it will go wrong so I watch for that. (Working both green field, in other existing repos and with other collaborators)
Moved on to a big data processing project, works great, needed a senior engineer to diagnose one small index problem which he identified in 30s. (But I’d bonked on for a week because in some cases I just don’t know what I don’t know)
Meanwhile a colleague wanted a sample of the data. Vibe coded that. (Extract from zip without decompressing) He wanted randomized. One shot. Done. Then he wanted randomized across 5 categories. Then he wanted 10x the sample size. Data request completed before the conversion was over. I would have worked on that for three hours before and bonked if I hit the limit of my technical knowledge.
Built a monitoring stack. Configured servers, used it to troubleshoot dozens of problems.
For stuff I can’t do, now I can do. For stuff I could do with difficulty now I can do with ease. For stuff I could do easily now I can do fast and easy.
Your vastly different experience is baffling and alien to me. (So thank you for opening my eyes)
I don’t find it baffling at all and both your experiences perfectly match mine.
Asking AI to solve a problem for you is hugely non-linear. Sometimes I win the AI lottery and its output is a reasonable representation of what I want. But mostly I loose the AI lottery and I get something that is hopeless. Now I have a conundrum.
Do I continue to futz with the prompt and hope if I wiggle the input then maybe I get a better output, or have I hit a limit and AI will never solve this problem? And because of the non linear nature I just never know. So these days I basically throw one dart. If it hits, great. If I miss I give up and do it the old fashioned way.
My work is in c++ primarily on what is basically fancy algorithms on graphs. If it matters.
Also at FAANG. I think I am using the tools more than my peers based on my conversations. The first few times I tried our AI tooling, it was extremely hit and miss. But right around December the tooling improved a lot, and is a lot more effective. I am able to make prototypes very quickly. They are seldom check-in ready, but I can validate assumptions and ideas. I also had a very positive experience where the LLM pointed out a key flaw in an API I had been designing, and I was able to adjust it before going further into the process.
Once the plan is set, using the agentic coder to create smaller CLs has been the best avenue for me. You don't want to generate code faster than you and your reviewers can comprehend it. It'll feel slow, but check ins actually move faster.
I will say it's not all magic and success. I have had the AI lead me down some dark corners, assuring me one design would work when actually it is a bit outdated or not quite the right fit for the system we are building for because of reasons. So, I wouldn't really say that it's a 10x multiplier or anything, but I'm definitely getting things done faster than I could on my own. Expertise on the part of the user is still crucial.
One classic issue I used to run into, is doing a small refactor and then having to manually fix a bunch of tests. It is so much simpler to ask the LLM to move X from A to B and fix any test failures. Then I circle back in a few minutes to review what was done and fix any issues.
The other thing is, it has visibility for the wider code base, including some of our infrastructure that we're dependent on. There have been a couple times in the past quarter where our build is busted by an external team, and I am able to ask the LLM given the timeframe and a description of the issue, the exact external failure that caused it. I don't really know how long it would have taken to resolve the issue otherwise, since the issues were missed by their testing. That said, I gotta wonder if those breakages were introduced by LLM use.
My job hasn't been this fun in a long, long time and I am a little uneasy about what these tools are going to mean for my personal job security, but I don't know how we can put the genie back into the bottle at this point.
I can second this. I’ve never had a problem writing short scripts and glue code in stuff ive mastered. In places I actually need help, I’m finding it slows me down.
Wow, that's such a drastic different experience than mine. May I ask what toolset are you using? Are you limited to using your home grown "AcmeCode" or have full access to Claude Code / Cursor with the latest and greatest models, 1M context size, full repo access?
I see it generating between 50% to 90% accuracy in both small and large tasks, as in the PRs it generates range between being 50% usable code that a human can tweak, to 90% solution (with the occasional 100% wow, it actually did it, no comments, let's merge)
I also found it to be a skillset, some engineers seem to find it easier to articulate what they want and some have it easier to think while writing code.
I used to think that the people who keep saying (in March 2026) that AI does not generate good code are just not smart and ask stupid prompts.
I think I've amended that thought. They are not necessarily lacking in intelligence. I hypothesize that LLMs pick up on optimism and pessimism among other sentiments in the incoming prompt: someone prompting with no hope that the result will be useful end up with useless garbage output and vice versa.
Exactly. You have to manifest at a high vibrational frequency.
Thanks for the laugh.
That sounds a lot more like confirmation bias than any real effect on the AI's output.
Gung-ho AI advocates overlook problems and seem to focus more on the potential they see for the future, giving everything a nice rose tint.
Pessimists will focus on the problems they encounter and likely not put in as much effort to get the results they want, so they likely see worse results than they might have otherwise achieved and worse than what the optimist saw.
That's a valid sounding argument. However many people with no strong view either way are producing functional, good code with AI daily, and the original context of this thread is about someone who has never been able to produce anything committable. Many, many real world experiences show something excellent and ready to go from a simple one shot.
This is kinda like that thing about how psychic mediums supposedly can't medium if there's a skeptic in the room. Goes to show that AI really is a modern-day ouija board.
The accurate inferences that can be drawn from subtle linguistic attributes should freak you out more than they do.
Switching one good synonym can send the model off an entirely different direction in response, or so I’ve observed.
Don’t know why you’re getting downvoted, this is a fascinating hypothesis and honestly super believable. It makes way more sense than the intuitive belief that there’s actually something under the human skin suit understanding any of this code.
It's probably more to do with the intelligence required to know when a specific type of code will yield poor future coding integrations and large scale implementation.
It's pretty clear that people think greenfield projects can constantly be slopified and that AI will always be able to dig them another logical connection, so it doesn't matter which abstraction the AI chose this time; it can always be better.
This is akin to people who think we can just keep using oil to fuel technological growth because it'll some how improve the ability of technology to solve climate problems.
It's akin to the techno capitalist cult of "effective altruism" that assumes there's no way you could f'up the world that you can't fix with "good deeds"
There's a lot of hidden context in evaluating the output of LLMs, and if you're just looking at todays success, you'll come away with a much different view that if you're looking at next year's.
Optimism is only then, in this case, that you believe the AI will keep getting more powerful that it'll always clean up todays mess.
I call this techno magic, indistinguishable from religious 'optimism'
This checks out logical speaking.
The FANG code basis are very large and date back years might not necessarily be using open source frameworks rather in house libraries and frameworks none of which are certainly available to Anthropic or OpenAI hence these models have zero visibility into them.
Therefore combined with the fact that these are not reasoning or thinking machines rather probabilistic (image/text) generators, they can't generate what they haven't seen.
That's why coding agents usually scans various files to figure out how to work in a particular codebase. I work with very large and old project, and Codex most of time manages to work with our frameworks.
No it doesn't check out. I think it's becoming abundantly clear LLMs learn in real time as they speak to you. There's a lot of denial and people claiming they don't learn that their knowledge is fixed on the training data and this is not even remotely true at all.
LLMs learn dynamically through their context window and this learning is at a rate much faster than humans and often with capabilities greater than humans and often much worse.
For a code base as complex and as closed source as google the problems an LLM faces is largely the same as a human. How much can he fit into the context window?
It checks out if you take into account most developers are actually rather mediocre outside of places where they spend an insane amount of time and money to get good devs (including but not limited to FANG)
You're observing this "paradox", because what you call learning here is not learning in the ML sense; it's deriving better conclusions from more data. It's true for many ML methods, but it doesn't mean any actual learning happens.
Huh? I have over a hundred services/repos checked out locally, ranging from 10+ years old to new. I have no problem leveraging AI to work in this large distributed codebase.
Even internal stuff is usable by the model because it’s a pattern matching machine and there should be documentation available, or it can just study the code like a human.
Yeah that's still very far away from FAANG repos
Same here. My take is that the codebase is too large and complex for it to find the right patterns.
It does work sometimes. The smaller the task, the better.
Isn’t that fixed by having it create a plan, then you review it and say “x should do y instead”, it updates the plan, iterate then “build the plan”?
Can you elaborate on the shortcomings you find in professional setting that aren't coming up on personal projects? With it handling greenfield tasks are you perhaps referring to the usual sort of boilerplate code/file structure setup that is step 0 with using a lot of libraries?
>I am yet to successfully prompt it and get a working commit.
May I ask what you're working on?
Not a FAANG engineer but also working at a pretty large company and I want to say you're spot on 1000%. It's insane how many "commenters" come out of the woodwork to tell you you're doing x or y wrong. They may not even frame it that way, but use a veneer of questions "what is your process like? Have you tried this product, etc." as a subtle way of completely dismissing your shared experience.
Experience depends on which FAANG it is. Amazon for example doesn't allow Claude Code or Codex so you are stuck with whatever internal tool they have
Meta, despite competing with these, is open to let their devs use better off the shelf tools.
I work at aws and generally use Claude Opus 4.6 1M with Kiro (aws’s public competitor to Claude Code). My experience is positive. Kiro writes most of my code. My complaints:
1. Degraded quality over longer context window usage. I have to think about managing context and agents instead of focusing solely on the task.
2. It’s slow (when it’s “thinking”). Especially when it’s tasked with something simple (e.g., I could ask Claude Opus to commit code and submit for review but it’s just faster if I run the commands myself and I don’t want to have to think about conditionally switching to Haiku / faster models mid task execution).
3. It often requires a lot of upfront planning and feedback loop set up to the extent that sometimes I wonder if it would’ve been faster if I did it myself.
A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.
The context degradation problem gets much worse when you have multiple agents or models touching the same project. One agent compacts, loses what it knew, and now the human is the only source of truth for what actually happened vs what was reported done. If that human isn't a coder, they can't verify by reading the source either.
I've been working on this and landed on a pattern I call a "mechanical ledger", basically a structured state file that sits outside any context window and gets updated as a side effect of work, not as a step anyone remembers to do. Every commit writes to it, every failed patch writes to it, every test run writes to it. When a session starts (or an agent compacts), it reads the ledger and rebuilds context from ground truth instead of from memory.
Its not a novel idea really, its basically what ops teams do with runbooks and state files, but applied to the AI agent handoff problem. The interesting bit is making the updates mechanical so no agent can forget to do it.
you're describing https://github.com/steveyegge/beads
Context degradation is a real problem.
> A smarter model would be great but there are bigger productivity gains to be had with a good set up, a faster model, and abstracting away the need to think about agents or context usage. I’m still figuring out a good set up. Something with the speed of Haiku with the reasoning of Opus without the overhead of having to think about the management of agents or context would be sweet.
I was thinking about this recently. This kind of setup is a Holy Grail everyone is searching for. Make the damn tool produce the right output more of the time. And yet, despite testing the methods provided by the people who claim they get excellent results, I still come to the point where the it gets off rails. Nevertheless, since practically everybody works on resolving this particular issue, and huge amounts of money have been poured into getting it right, I hope in the next year or so we will finally have something we can reliably use.
Meta is doing something healthy - signalling that it is behind with its LLM efforts. Nothing wrong with that.
Could you say more on how the tasks where it works vs. doesn't work differ? Just the fact that it's both small and greenfield in the one case and presumably neither in the other?
Can you provide an example of how you actually prompt AI models? I get the feeling the difference among everyone's experiences has to do with prompting and expectation.
Biggest difference I've noticed is giving the model constraints upfront rather than letting it freestyle. Something like "use only the standard library, no new files, keep it under 50 lines" produces dramatically better results than open-ended "build me X." It's less about clever prompting and more about narrowing the solution space so it can't wander off.
I find that the default Claude Code harness deals with the ambiguity best right now with the questionnaire system. So you can pose the core of the problem first and then specify only those implementation details that matter.
I wasn't implying that clever prompting needed to be used. I'm just trying to confirm that the person I was replying to isn't just saying what essentially amounts to "build me X".
When I write my prompts, I literally write an essay. I lay constraints, design choices, examples, etc. If I already have a ticket that lays out the introduction, design considerations, acceptance criteria and other important information, then I'll include that as well. I then take the prompt I've written and I request for the model to improve the prompt. I'll also try to include the most important bits at the end since right now models seem to focus more on things referenced at the end of a prompt rather than at the beginning.
Once I do get output, I then review each piece of generated code as if I'm doing an in-depth code review.
No one is saying “build x” and getting good results unless they didn’t have any expectations to begin with. What you describe is precisely right. Using the agents require a short leash and clear requirements and good context management (docs, skills, rules).
Some people (like me) still think that’s a fantastic tool, some people either don’t know how to do this or think the fact you have to do this means the tools are bunk. Shrug.
Around a year ago I started a new position at a very large tech company that I won't name, working on a pre-existing web project there. The code base isn't terrible - though not very good either, by-and-large - but it's absolutely massive, often over-engineered, pretty unorthodox, and definitely has some questionable design decisions; even after more than a year of working with it I still feel like a beginner much of the time.
This year I grudgingly bit the bullet and began using AI tools, and to my dismay they've been a pretty big boon for me, in this case. Not just for code generation - they're really good at probing the monolith and answering questions I have about how it works. Before I'd spend days pouring over code before starting work to figure out the right way to build something or where to break in, pinging people over in India or eastern Europe with questions and hoping they reply to me overnight. AI's totally replaced that, and it works shockingly well.
When I do fall back on it for code generation, it's mostly just to mitigate the tedium of writing boilerplate. The code it produces tends to be pretty poor - both in terms of style and robustness - and I'll usually need to take at least a couple of passes over it to get it up to snuff. I do find this faster than writing everything out by hand in the end, but not by a lot.
For my personal projects I don't find it adds much, but I do enjoy rubber ducking with ChatGPT.
Using these tools for understanding seems to be one of the best use cases - lots of pros, less dangerous cons (worst case scenario is a misleading understanding, but that can be minimized by directly double checking the claims being made).
In fact it looks like an arising theme is that whenever we use these tools it's valuable to maintain a human understanding of what's actually going on.
As a veteran freelance developer - aside from some occasional big wins, I'd say it's been net neutral or even net negative to my productivity. When I review AI-generated code carefully (and if I'm delivering it to clients I feel that's my responsibility) I always find unnecessary complexity, conceptual errors, performance issues, looming maintainability problems, etc. If I were to let it run free, these would just compound.
A couple "win" examples: add in-text links to every term in this paragraph that appears elsewhere on the page, plus corresponding anchors in the relevant page parts. Or, replace any static text on this page with any corresponding dynamic elements from this reference URL.
Lose examples: constant, but edit format glitches (not matching searched text; even the venerable Opus 4.6 constantly screws this up), unnecessary intermediate variables, ridiculously over-cautious exception-handling, failing to see opportunities to isolate repeated code into a function, or to utilize an existing function that exactly implements said N lines of code, etc.
It can only result in more work if you freelance because it you disclose that you used llm’s then you did it faster than usual and presumably less quality so you have to deliver more to retain the same income except now your paying all the providers for all the models because you start hitting usage limits and claude sucks on the weekends and your drive is full of ‘artifacts’, which incurs mental overhead that is exacerbated by your crippling adhd
And then all of a sudden you’re just arguing with the terminal all day - the specs are written by gpt, delivered in-the email written by gpt. Sometimes they dont even have the time to slice their prompt from the edges of the paste but the only thing i can think of is “i need to make the most of 0.5x off peak claude rates “
Fuck.
I got lots of pretty TUIs though so thats neat
Have you perceived a market shift for freelancers given the rise of AI coding?
It seems to me that sadly, paying for getting a few isolated tasks done is becoming a thing of the past.
No slowdown that I've seen - my style of freelancing is pretty long-term though, clients I've known and worked with for many years.
I found it’s great for: 1. Exploring new code bases or understanding PRs. 2. Prototyping new ideas. 3. Second opinion on problems and troubleshooting. 4. Rubber ducking. 5. Parallelize rote/boiler plate, while my deep focus is elsewhere. 6. First draft of documentation and reviews.
What I don’t understand is how some people can parallelize 5-10 engineering changes at once, and expect to support and maintain that code in the future.
The difficulty has always been in support and maintenance, not building something new, and that requires a deeper understanding.
The majority of code I've written since November 2025 has been created using agents, as opposed to me typing code into a text editor. More than half of that has been done from my iPhone via Claude Code for web (bad name, great software.)
I'm enjoying myself so much. Projects I've been thinking about for years are now a couple of hours of hacking around. I'm readjusting my mental model of what's possible as a single developer. And I'm finally learning Go!
The biggest challenge right now is keeping up with the review workload. For low stakes projects (small single-purpose HTML+JS tools for example) I'm comfortable not reviewing the code, but if it's software I plan to have other people use I'm not willing to take that risk. I have a stack of neat prototypes and maybe-production-quality features that I can't ship yet because I've not done that review work.
I mainly work as an individual or with one other person - I'm not working as part of a larger team.
Are you saying you're learning go because you've freed up time elsewhere or is AI helping?
I'm having Claude Code write me full apps in Go and learning the language by osmosis.
Vibe-learning
How often do you find issues during review? What kinds of issues?
Usually it's specification mistakes - I spot cases I hadn't thought to cover, or the software not behaving as usefully as if I had made a different design decision.
Occasionally I'll catch things it didn't implement at all, or find things like missing permission checks.
Surprised to see HN being bearish on this.
I have 10 years of experience. I am a reasonable engineer. I can tell you that about half of the hype on twitter is real. It is a real blessing for small teams.
We have 100k DAU for a consumer crud app. We built and maintain everything in-house with 3 engineers. This would have taken atleast 10 engineers 3-4 years back.
We don't have a bug list. We are not "vibe coding" , 2 of us understand almost all of the codebase. We have processes to make sure the core integrity of codebase doesn't go for a toss.
None has touched the editor in months.
Even the product folks can raise a PR for small config changes from slack.
Velocity is through the roof and code quality is as good if not better than when we write by hand.
We refactor almost A LOT more than before because we can afford to.
I love it.
HN is in denial, which is understandable
AI is already better at understanding code than 99.99% of human, the more I use it the more I believe this is true. It can draw connections between dots far quicker and accurate than a human could ever be.
At very least, AI is going to be a must even as a co-supervisor to your project
What in doubt right now, is whether AI can manage a codebase fully autonomously without bring it down, which I doubt it can at the moment. Be it 4.6 or 5.4, they always, almost always, add code instead of removing them, the sheer complexity will explode at certain point.
But that is my assessment for models TODAY, who knows where they will end up being in 6 months. AI is entering the recursive self improvement phase, that roadmap is laying in front our eyes, what it can and would unlock is truly, truly unpredictable.
I am both intrigued and scared.
The RAG models are very competent at programming. I am worried about my job as a SWE in the near future, but didn't the MIT paper about a week ago pretty much confirm that width-scaling the model is about to (or has already) stopped giving any measurable increase in quality because the training data no longer overfills the model?
Any authentic training data from pre-LLM's is assumed to have been used in training already and synthetic or generated data gives worse performing models, so the path of increasing its training data seems to be a dead end as well?
What is the next vector of training? Maybe data curation? Remove the low quality entries and accept a smaller, but more accurate data set?
I think the AI companies are starting to sweat a little, considering the promises they have made, their inability to deliver and turn a profit at its current state and the slowing improvements.
Interesting times! We are either all out of jobs or a massive market crash is imminent, awesome...
Link? Genuinely curious to check it out.
100k DAU - you’ll lose 98% within 6-9 months once 1-2 person team clones it as sells it for 10% of what you are charging
That was always a possiblty even before AI, what's hard to clone is how they got those users
possibility yes - reality often no due to cost that would have to be incurred to make this happen. the "how they got those users" is the easy part if you offering is same(ish) at a fraction of the cost.
not if LTV is already only a little higher than CAC and all the marketing channels are already saturated
most developers are still in denial. Many are afraid of job loss or the corporations are forcing AI without clear scopes and proper implementation, which results in a mess. Small teams for small-medium products are productive as hell with AI.
Net negative. I do find it genuinely useful for code review, and "better search engine" or snippets, and sometimes for rubber ducking, but for agent mode and actual longer coding tasks I always end up rewriting the code it makes. Whatever it produces always looks like one of those students who constantly slightly misunderstands and only cares about minor test objectives, never seeing the big picture. And I waste so much time on the hope that this time it will make me more productive if only I can nudge it in the right direction, maybe I'm not holding it right, using the right tools/processes/skills etc. It feels like javascript frameworks all over again.
Same. I vacillate between thinking our profession will soon be over to thinking we’re perfectly safe. Sometimes, it’s brilliant. It is very good at exploring and explaining a codebase, finding bugs, and doing surgical fixes. It’s sometimes good at programming larger tasks, but only if you really don’t care about code quality.
The one thing I’m not sure about is: does code quality and consistency actually matter? If your architecture is sufficiently modular, you can quickly and inexpensively regenerate any modules whose low quality proves to be problematic.
So, maybe we really are fucked. I don’t know.
I've had quite a bit of luck with using AI-assisted tooling for some specific workflows, and very little luck with others. To the extent that there's a trend[^1], it seems to be that tasks where I would spend a lot of time to produce a very small amount of output which is easy to evaluate objectively[^2] are sped up considerably, tasks where I would produce a large amount of output quickly (e.g. boilerplate) are sped up slightly, and most other tasks are unaffected or even slowed down (if I try to use AI tooling for them and decide it's not good enough yet).
As always, my views are my own and do not necessarily reflect the views of my employer.
[^1]: There's less of a trend than I'd expect. There are some quite difficult-to-me tasks that AI nails (e.g. type system puzzles) and some trivial-to-me tasks that AI struggles with (e.g. "draw correct conclusions when an image is uploaded of an ever-so-slightly nonstandard data visualization like a stacked bar chart").
[^2]: My favorite example of this is creating a failing test with a local reproduction of a reported bug on production - sure I _could_ write this myself, but usually these tests are a little bit finicky to write, but once written are either obviously testing the right thing or obviously testing the wrong thing, and the code quality doesn't really matter, so there's not much benefit in having human-written code while there's a substantial benefit in having any tests like this vs not having them.
It’s completely inconsistent for me, and any time I start to think it is amazing, I quickly am proven wrong. It definitely has done some useful things for me, but as it stands any sort of “one shot” or vibecoding where I expect the ai to complete a whole task autonomously is still a long ways off.
Copilot completions are amazingly useful. chatting with the chatbot is a super useful debugging tool. Giving it a function or database query and asking the ai to optimize it works great. But true vibe coding is still, imho, more of a party trick than an actual productivity multiplier. It can do things that look useful, and it can do things that solve immediate self-contained problems. but it can’t create launchable products that serve the needs of multiple users.
I work at a very prominent AI company. We have access to every tool under the sun. There are various levels of success for all levels — managers, PMs, engineers.
We have cursor with essentially unlimited Opus 4.6 and it’s fundamentally changed my workflow as a senior engineer. I find I spend much more time designing and testing my software and development time is almost entirely prompting and reviewing AI changes.
I’m afraid my coding skills are atrophying, in fact I know the are, but I’m not sure if the coding was the part of my job I truly enjoyed. I enjoy thinking higher-level: architecture, connecting components, focusing on the user experience. But I think using these AI tools is a form of golden handcuffs. If I go work at a startup without the money I pay for these models, I think for the first time in my career I would be less likely to be able to successfully code a feature than I could last year.
So professionally there are pros and cons. My design and architecture skills have greatly improved as I am spending more time doing this.
Personally it’s so much fun. I’ve made several side projects I would have never done otherwise. Working with Claude code on greenfield projects is a blast.
I think people get a bit paranoid about coding skills atrophying. I had a period where I stopped programming for multiple years and it really only took a month to get back into the swing of things when I returned, and most of that was just re-jogging my memory on the syntax and standard library classes (C++ at the time).
A month is quite a long time compared to "I can just do this at-will from neutral at any time".
...particularly in situations where you might have to navigate a change in jobs and get back to the point where you can reasonably prove that you can program at a professional level (will be interesting to see how/if the interviewing process changes over time due to LLMs).
i also worry but am also shocked how far a single $20 sub gets me on side project. i pay for 3 (cc, codex, gemini) but am almost never going beyond cc, even when im merging several prs a day.
It is phenomenal.
I have a lot of experience, low and high level. These AI tools allow me to "discuss" possibilities, research approaches, and test theories orders of magnitude faster than I could in the past.
I would roughly estimate that my ability to produce useful products is at least 20x. A good bit of that 'x' is because of the elimination of mental barriers. There have always been good ideas I had which I knew could work, but I also knew that to prove that they could work would take a lot of focus and research (leveling up on specific things). And that takes human energy - while I'm busy also trying to do good things in my day job.
Now I have immensely powerful minions and research assistants. I can test any theory I have in an hour or less.
While these minions are being subsidized in the wonderful VC way, I can get a lot of done. If the real costs start to bleed through, I'll have to scale back my explorations. (Because at a point, I'll have to justify testing my theories against spending 2-300$.)
To your questions, I'm usually a solo builder anyway. I've built serious things for serious companies, but almost always solo. So that's quite a burden. And now I'm weary of all that corporate stuff, so I build for myself. And what a joy it is, having these powertools.
If I were in a company right now, I could absolutely replace a team of 5 people with me + AI... assuming the CTO wasn't the (usual) limiting factor.
It's really interesting how delusional people here can get when their livelihood depends on it. It's a game changer guys. I've been working professionally for 12 years. Big companies, small companies, freelance, startup CTO nowadays. It's multiplier. It gives me superpowers. If you don't feel the superpowers, you're either missing out or in denial. Embrace agentic coding.
Im at a public, well known tech company.
We got broad and wide access to AI tools maybe a month ago now. AI tools meaning claude code, codex, cursor and a set of other random AI tools.
I use them very often. They've taken a lot of the fun and relaxing parts of my job away and have overall increased my stress. I am on the product side of the business and it feels necessary for me to have 10 new ideas and now the ones with the most ideas will be rewarded, which I am not as good at. Ive tried having the agents identify opportunities for infra improvements and had no good luck there. I haven't tried it for product suggestions but I think it would be poor at that too.
I get sent huge PRs and huge docs now that I wasnt sent before with pressure to accept them as is.
I write code much faster but commit it at the same pace due to reviews taking so long. I still generate single task PRs to keep them reviewable and do my own thorough review before hand. I always have an idea in ny head about how it should work before getting started, and I push the agent to use my approach. The AI tools are good at catching small bugs, like mutating things across threads. I like to use it to generate plans for implementation (that only I and the bots read, I still handwrite docs that are broadly shared and referenced).
Overall, AI has me nervous. Primarily because it does the parts that I like very well and has me spending a higher portion of my job on the things I dont like or find more tiresome.
I built an ERP system called PAX ERP mostly solo for small manufacturers in the USA.
Stack is React, Express, PostgreSQL, all on AWS with a React build pipeline through GitHub Actions. It handles inventory, work orders (MRP), purchasing, GAAP accounting, email campaigns, CRM, shipping (FDX or UPS), etc.
AI has been useful (I use Claude Code, mainly Haiku model), but only if I'm steering it carefully and reviewing everything. It is obviously not great at system design, so I still need to know exactly what I'm trying to do. If I don't it'll often make things overly complicated or focus on edge cases that don't really exist.
It helps a lot with: Writing/refactoring SQL, Making new API routes based on my CRUD template, Creating new frontend components from specs and existing templates, Debugging and explaining unexpected behavior, Integrating third-party APIs (Fedex for shipping, Resend for emails). It understands their documentation easily and helps connect the pieces.
In practice, it feels like a fancy copy/paste (new routes, components) or a helpful assistant. With careful guidance and review, it's a real efficiency boost, especially solo.
I'm somewhat optimistic... I think a lot of companies and managers are in a wait and see mode. The AI tooling can be genuinely helpful, but IMO definitely needs manual review and testing for function and security.
Personally, mostly just using it for personal projects that I've been sitting on for years... it's been pretty motivating and I'm actually making progress, though I'm split across half a dozen things so it's slower going. I'm also far more interactive than vibe coding one and done efforts. I'm finicky when it comes to UX for user facing apps, and DevEx for developer APIs on libraries I work on.
It's nice, far from perfect... I still liken it to managing foreign developer teams... the more you specify ahead of time, the better the results. The difference is a turn around in minutes instead of the next business day. Iteration is very fast by comparison. That said, sometimes the agent will make toddler decisions like rewriting all the broken tests and updating the docs to match the behavior instead of correcting the behavior to match the api and tests.
I foresee that the AI blindness at CEO/CFO level and the general hype (from technical and non technical press and media) in our society that software engineering is over etc will result in severe talent shortage in 5-7 years resulting in bidding wars for talent driving salaries 3x upwards or more.
Then we'll be back to 2019/2020 cycle and round and round the merry go round we go
Before the advent of widespread LLM usage, and more particularly, before LLM-using coworkers began submitting large PRs against codebases I am the primary maintainer of, my velocity had never been greater.
I do not like the current culture around LLMs. I do not use LLMs, I shall continue to resist any peer pressure to use them, much as I have resisted IDEs in favour of CLI tooling, vim, tmux and the like. I feel my output is incredibly devalued compared to the before times.
On the one hand, my passion for personal projects has never been greater. It helps me feel as though I am bettering myself, pushing the boundaries of my capabilities without resorting to LLM usage. However, I no longer release my code openly.
On the other hand, on top of building resentment towards being treated even more interchangeably than before, I resent the asymmetry of my LLM-using peers submitting large pull requests I am obliged to review, on codebases I have never touched before, applications I have never had the occasion to use; my team is managed in such a way that everybody is more or less working on independent projects, due to pressures to deliver at a much faster pace.
I really wish the fervour would die down for the sake of my own sanity.
This feels like a pretty negative take on what seems like impactful technology that is not going away, and will (and already has had) big impacts on how people work and build. Do you completely reject the idea of using them, ever? Do they have absolutely 0 utility for you?
Currently in my third year working full time and sort of realizing two things. 1. AI ( specifically Claude code and codex ) is really good and can do quite a bit of the work that when I started I had to myself. 2. AI can’t do all of the work and the last 10% ( or 5 or whatever percent it can’t do ) is something I need to do and the only way I can do it properly is if I have a good mental model from the other 90% which doesn’t happen if I use Claude code
So far this year I’ve realized I’m better of not using it except for simple questions I would otherwise google. Not necessarily because it’s bad but because it makes me worse.
Edit: this also sort of applies to my side projects as well. I’m realizing more the value of side project wasn’t the end result ( since those are mostly my own personal apps ) but the learnings gleaned from them I don’t get if I use Claude code.
I use it purely as a search engine. It is pretty good at dehtml/dejs/decss or a distraction free access to articles / docs. I had dabbled with claude for purely writing testcases which I'm very lazy at.
But man, people really oversell LLM's coding capabilities. It is good at pattern matching and replicating elsewhere. That's about it.
I had asked a simple question to opus 4.5.
> Python redis asyncio sample code.
The 3 attempts at generation all failed at the import statement. I verified if such an import structure ever existed in older versions. Never. But it looked convincing enough though!?
Is it because there is not much asyncio code (2014) is available for claude to train on?
i'm a senior engineer at a mid-size, publicly traded company.
my team has largely avoided AI; our sister team has been quite gungho on it. i recently handed off a project to them that i'd scoped at about one sprint of work. they returned with a project design that involved four microservices, five new database tables, and an entirely new orchestration and observability layer. it took almost a week of back-and-forth to pare things down.
since then, they've spent several sprints delivering PRs that i now have to review. there's lots of things that don't work, don't make sense, or reinvent things we already have from scratch. almost half the code is dedicated to creating 'reusable' and 'modular' classes (read: boilerplate) for a project that was distinctly scoped as a one-off. as a result, this takes hours, and it's cut into my own sprint velocity. i'm doing all the hard work but receiving none of the credit.
management just told me that every engineer is now required to use AI. i'm tired.
It has definitely made me more productive. That said, that productivity isn't coming from using it to write business logic (I prefer to have an in-depth understanding of the logical parts of the codebases that I'm working on. I've also seen cases in my work codebases where code was obviously AI generated before and ends up with gaping security or compliance issues that no one seemed to see at the time).
The productivity comes from three main areas for me:
- Having the AI coding assistance write unit tests for my changes. This used to be by far my least favorite part of my job of writing software, mostly because instead of solving problems, it was the monotonous process of gathering mock data to generate specific pathways, trying to make sure I'm covering all the cases, and then debugging the tests. AI coding assistance allows me to just have to review the tests to make sure that they cover all the cases I can think of and that there aren't any overtly wrong assumptions
- Research. It has been extraordinarily helpful in giving me insight into how to design some larger systems when I have extremely specific requirements but don't necessarily have the complete experience to architect them myself - I know enough to understand if the system is going to correctly accomplish the requirements, but not to have necessarily come up with architecture as a whole
- Quick test scripts. It has been extremely useful for generating quick SQL data for testing things, along with quick one-off scripts to test things like external provider APIs
> Research. It has been extraordinarily helpful in giving me insight into how to design some larger systems when I have extremely specific requirements but don't necessarily have the complete experience to architect them myself - I know enough to understand if the system is going to correctly accomplish the requirements, but not to have necessarily come up with architecture as a whole.
I agree, this is where coding agents really shine for me. Even if they get the details wrong, they often pinpoint where things happen and how quite well.
They're also great for rapid debugging, or assisted bug fixing. Often, I will manually debug a problem, then tell the AI, "This exception occurs in place Y because thing X is happening, here's a stack trace, propose a fix", and then it will do the work of figuring out where to put the fix for me. I already usually know WHAT to do, it's just a matter of WHERE in context. Saves a lot of time.
Likewise, if I have something where I want thing X to do Y, and X already does Z, then I'll say, "Implement a Y that works like Z but for A B C", and it'll usually get it really close on the first try.
I've only recently begun using copilot auto-complete in Visual Studio using Claude (doing C# development/maintenance of three SaaS products). I've been a coder since 1999.
The suggestions are correct about 40% of the time, so I'm actually surprised when they're right, rather than becoming reliant on them. It saves me maybe 10 minutes a day.
The only part AI auto complete I found I really like is when I have a function call that takes like a dozen arguments, and the auto complete can just shove it all together for me. Such a nice little improvement.
My least favourite part of the auto complete is how wordy the comments it wants to create are. I never use the comments it suggests.
I have been begging Claude not to write comments at all since day 1 (it's in the docs, Claude.md, i say the words every session, etc) and it just insists anyway. Then it started deleting comments i wrote!
Fucking robot lol
Do you mean suggesting arguments to provide based on name/type context?
Yeah, it usually gets the required args right based on various pieces of context. It have a big variation though between extension. If the extension can't pull context from the entire project (or at least parts of it) it becomes almost useless.
IntelliJ platform (JetBrains IDEs) has this functionality out of the box without "AI" using regular code intelligence. If all your parameters are strings it may not work well I guess but if you're using types it works quite well IME.
Can't use JetBrains products at work. I also unfortunately do most of my coding at work in Python, which I think can confound things since not everything is typed
... you can't use JetBrains? What logic created a scenario where you can't use arguably the best range of cross platform IDEs, but you can somehow use spicy autocomplete to imitate some of their functionality, poorly?
I work in an extremely security minded industry. There are strict guidelines about what we can and can't use. JetBrains isn't excluded for technical reasons, but geopolitical ones.
The AI models we use are all internally hosted, and any software we use has to go through an extensive security review.
Context: micro (5 person) software company with a mature SaaS product codebase.
We use a mix of agentic and conversational tools, just pick your own and go with it.
For Unity development (our main codebase and source of value) I give current gen tools a C- for effectiveness. For solving confined, well modularisable problems (eg refactor this texture loader; implement support for this material extension) it’s good. For most real day to day problems it’s hopelessly confused by the large codebase full of state, external dependency on chunks of Unity, implicit hardware-dependent behaviours, etc. It has no idea how to work meaningfully with Unity’s scene graph or component model. I tried using MCP to empower it here: on a trivial test project it was fine. In a real project it got completely lost and broke everything after eating 30k tokens and 40 minutes of my time, mostly because it couldn’t understand the various (documented) patterns that straddled code files and scene structure.
For web and API development I give it an A, with just a little room for improvement. In this domain it’s really effective all the way down the logical stack from architectural and deployment decisions all the way down to implementation details and debugging including digging really deep in to package version incompatibilities and figuring out problems in seconds that would take me hours. My one criticism would be the - now familiar - “junior developer” effect where it’ll often run ahead with an over engineered lump of machinery without spotting a simpler more coherent pattern. As long as you keep an eye on it it’s fine.
So in summary: if what you’re doing is all in text, nothing in binary, doesn’t involve geometric or numerical reasoning, and has billions of lines of stack overflow solutions: you’ll be golden. Otherwise it’s still very hit and miss.
I have good success using Copilot to analyze problems for me, and I have used it in some narrow professional projects to do implementation. It's still a bit scary how off track the models can go without vigilance.
I have a lot of worry that I will end up having to eventually trudge through AI generated nightmares since the major projects at work are implemented in Java and Typescript.
I have very little confidence in the models' abilities to generate good code in these or most languages without a lot of oversight, and even less confidence in many people I see who are happy to hand over all control to them.
In my personal projects, however, I have been able to get what feels like a huge amount of work done very quickly. I just treat the model as an abstracted keyboard-- telling it what to write, or more importantly, what to rewrite and build out, for me, while I revise the design plans or test things myself. It feels like a proper force multiplier.
The main benefit is actually parallelizing the process of creating the code, NOT coming up with any ideas about how the code should be made or really any ideas at all. I instruct them like a real micro-manager giving very specific and narrow tasks all the time.
TBH it kinda makes sense why personal projects are where productivity jumps are much larger.
Working on projects within a firm is... messy.
I’ve been an overt AI hater but have found very recently that, though I still hate a great many things about AI, it has become useful for coding.
In 10m Gemini correctly diagnosed and then fixed a bug in a fairly subtle body of code that I was expecting to have to spend a couple hours working on.
I spent much of the past week using Gemini to build a prototype of a clean new (green field) system involving RPCs, static analysis, and sandboxing. I give it very specific instructions, usually after rounds of critical design discussions, and it generates structurally correct code that passes essentially valid tests. Error handling is a notable weakness. I review the code by hand after each step and often make changes, and I expect to go over the over the whole thing very carefully at the end, but it has saved me many hours this week.
Perhaps more valuable than the code has been the critical design conversation, in which it mostly is fluent at the level of an experienced engineer and has been able explain, defend, and justify design choices quite coherently. This saved time I would otherwise have spent debating with coworkers. But it’s not always right and it is easily led astray (and will lead astray), so you need a clear idea in mind, a firm hand, and good judgment.
> This saved time I would otherwise have spent debating with coworkers. But it’s not always right and it is easily led astray (and will lead astray), so you need a clear idea in mind, a firm hand, and good judgment.
The “will lead astray” part is concerning. If you already have a clear idea in mind, you probably don’t need to have the debate with coworkers.
If you are having a debate with coworkers or AI, you would rather that they be knowledgeable enough to not lead you astray.
In cases where I don’t have a clear understanding of some area, yet I don’t have someone knowledgeable to talk to, I have found myself having to discuss the same point with multiple LLMs from multiple angles to tease out the probable right way.
In summary: obviate experts, receive correct guidance, save time —- pick any two.
Most of the things I've used LLMs for is scripting code for integrations between systems, or scripts that extract and transform data from APIs.
For this specific use case, LLMs and their integrations with tools like VSCode have been excellent. A simple instruction file dictating what libraries to use, and lines about where to look for up-to-date API docs, increases the chances of one-shots significantly.
My favorite part has been that I'm able to use libraries I wouldn't have used previously like openpyxl. A use case like "get data from an API, transform it, and output it to an excel file with these columns" is super fast, and outputs data to a stakeholder/non-techy format.
It made me chuckle when Claude etc. release Excel integrations, since working with Excel files seems to have been at a great stage for people who've already worked with Excel/CSV libraries.
The number 1 suggestion I'd have for people eager to work with text is to use models to learn about old unix tools like grep/sed etc. With these powerful tools + modern tools + code you can build quite complex integration code for many uses. Don't sleep on the classic unix cli commands and download stuff from github to achieve things that have already been solved 40 years ago :)
I just started a new job. A mix of JS, WASM (C++) and Python. It's been a blessing to help understand and explain an unfamiliar codebase to me. Sometimes the analysis isn't deep enough but I've been able to create enough guiding documents to get it right most of the way through, and for the rest I can continue and dive deeper on my own with further prompting.
I've started using it to write some code, which I then use further prompting to review before my own final review. I feel a lot more productive, I can focus on high level ideas and not think about tiny implementation details.
Having instrumentation code magically created in minutes and being able to validate assumptions before/after making changes by doing manual testing and feeding AI logs has been a great use for me - this kind of stuff is boring and would kill my motivation and productivity in the past. AI helps here so I can move on to the fun stuff, helps me stay engaged and interested.
It's great for writing unit tests and doing log analysis. The usual AI pitfalls apply like going into loops that lead nowhere and hallucinating things, but I've gotten better at spotting it and steering it away. I try not to take what it gives me at face value and use follow up prompts to challenge assumptions or verify things.
So overall, it's been an immense help for me. I've got some interesting projects coming up that are more greenfield work, we'll see if this holds up compared to an existing codebase.
I’m not a professional developer but I can find my way around several languages and deployment systems. I have used Claude to migrate a mediumsized Laravel 5 app to Laravel 11 in about 2-3 days. I would not have dared to touch it otherwise.
In my day job I’m currently a PM/operations director at a small company. We don’t have programmers. I have used AI to build about 12 internal tools in the past year. They’re not very big, but provide huge productivity gains. And although I do not fully understand the codebase, I know what is where. Three of these tools I’m now recreating based on our usage and learnings.
I have learned a ton about all kinds of development concepts in a ridiculously short timeframe.
Am using Claude to attempt to do refactoring and find bugs. Sometimes its fantastic, finding issues instantly that'd take a lot of trawling or insider knowledge otherwise. Other times it gets obsessed about irrelevant things, makes suggestions that for some other obscure but non obvious reason don't work in practice. The generated code sometimes has excellent ideas I wouldn't have thought of. Other times it has places for bugs to lurk e:g if a directory isn't there, make it. Er, no thanks I want you to blow up if the dir isn't there because if it isn't , something else major went wrong. The trick is knowing when its going to be good and when hopeless and take you down a rabbit hole. Perhaps that is a meta skill on the part of the human developer. But I'm not optimistic about things improving, its the nature of how it is. The AI doesn't know personally the previous devs on the team, their programming tastes, the discussions they had at planning etc. Its got no context.
I am working on a sub 100KLOC Rust application and can't productively use the agentic workflows to improve that application.
On the other hand, I have tried them a number of times in greenfield situations with Python and the web stack and experienced the simultaneous joy and existential dread of others. They can really stand new projects up quick.
As a founder, this leaves me with what I describe as the "generation ship" problem. Is it possible that the architecture we have chosen for my project is so far out of the training data that it would be faster to ditch the project and reimplement it from scratch in a Claude-yolo style? So far, I'm convinced not because the code I've seen in somewhat novel circumstances is fairly mid, but it's hard to shake the thought.
I do find chatting with the models incredibly helpful in all contexts. They are also excellent at configuring services.
If what you are doing is novel then I don't think yolo'ing it will help either. Agents don't do novel. I've even noticed this in meeting summaries produced by AI: A prioritisation meeting? AI's summary is concise, accurate, useful. A software algorithm design meeting, trying to solve a domain-specific issue? AI did not understand a word of what we discussed and the summary is completely garbled rubbish.
If all you're doing is something that already exists but you decided to architecture it in a novel way (for no tangible benefit), then I'd say starting from scratch and make it look more like existing stuff is going to help AI be more productive for you. Otherwise you're on your own unless you can give AI a really good description of what you are doing, how things are tied together etc. And even then it will probably end up going down the wrong path more often than not.
I’m a UX designer not a coder, but this is so bizarre to me because shouldn’t every project be doing something novel? Otherwise why does it exist? If this industry is so full of people independently writing the same stuff that AI can replicate it…then it was a vast misallocation of resources to begin with.
Sometimes the novelty lives in a different part of the problem. (e.g. a service that does basic bog standard web forms, but for some novel purpose)
I'm surprised there's no more attempts to stablize around a base model, like in stable diffusion, then augment those models with LoRas for various frameworks, and other routine patterns. There's so much going into trying to build these omnimodels, when the technology is there to mold the models into more useful paradigms around frameworks and coding patterns.
Especially, now that we do have models that can search through code bases.
I work in an R&D team as research scientist/engineer.
Cursor and Claude Code have undoubtedly accelerated certain aspects of my technical execution. In particular, root causing difficult bugs in a complicated codebase has been accelerated through the ability to generate throwaway targeted logging code and just generally having an assistant that can help me navigate and understand complex code.
However, overall I would say that AI coding tools have made my job harder in two other ways:
1. There’s an increased volume of code that requires more thorough review and/or testing or is just generally not in keeping with the overall repo design.
2. The cost is lowered for prototyping ideas so the competitive aspect of deciding what to build or which experiment to run has ramped up. I basically need to think faster and with more clarity to perform the same as I did before because the friction of implementation time has been drastically reduced.
I got insanely more productive with Claude Code since Opus 4.5. Perhaps it helps that I work in AI research and keep all my projects in small prototype repos. I imagine that all models are more polished for AI research workflow because that's what frontier labs do, but yeah, I don't write code anymore. I don't even read most of it, I just ask Claude questions about the implementation, sometimes ask to show me verbatim the important bits. Obviously it does mistakes sometimes, but so do I and everyone I have ever worked with. What scares me that it does overall fewer mistakes than I do. Plan mode helps tremendously, I skip it only for small things. Insisting on strict verification suite is also important (kind of like autoresearch project).
I am working on getting the sailing Captain license (I started sailing when I was 8), and move my life there. I hate how things work nowadays. I feel like I am a police officer vs my friends/coworkers AI code. And I don't want to do that
I'm always skeptical to new tech, I don't like how AI companies have reserved all memory circuits for X years, that is definitely going to cause problems in society when regular health care sector businesses can't scale or repair their infra, and the environmental impact is also a discussion that I am not qualified to get into.
All I can say for sure is that it is absolutely useful, it has improved my quality of life without a doubt. I stick to the principle that it's here to improve my work life balance, not increase output for our owners.
And that it has done, so far. I can accomplish things that would have taken me weeks of stressful and hyperfocused work in just hours.
I use it very carefully, and sparingly, as a helpful tool in my toolbox. I do not let it run every command and look into every system, just focused efforts to generate large amounts of boilerplate code that would require me to have a lot of docs open if I were to do it myself.
I definitely don't let it read or write my e-mails, or write any text. Because I always loved writing, and will never stop loving it.
It's here to stay, because I'm not alone in feeling this way about it. So the staunch AI-deniers are just wasting their time. Just like any other tech, it's going to be used against humans, against the already oppressed.
I definitely recognize that the tech has made some people lose their minds. Managers and product owners are now vibe coding thinking they can replace all their developers. But their code base will rot faster than they think.
A guy on the team passes the issues he gets directly to Copilot, and holy crap, it shows. He hasn't admitted to doing it, but the full code rewrites whenever he's asked to change something are telling.
I'm getting tired, honestly. I'd prefer the simpler "I don't know" of old to six pages of bullshit I have to review.
I run a small game studio. I use Cursor to write features that I don’t want to hand code, but wouldn’t ask a teammate to do. Usually that is because describing the idea to a person would take about as much effort and the result would take longer.
These are usually internal tools, workflow improvements, and one off features. Anything really central to the game’s code gets human coded.
I think the further you are from the idea part, the less fun AI coding will be for you. Because now you need to not just translate some spec to code, you have to translate it to a prompt, which ups the chances of playing the telephone game. At least when you write the code yourself you are getting real with it and facing all the ambiguities as a matter of course. If you just pass it to an LLM you never personally encounter the conflicts, and it might make assumptions you would not… but you don’t even realize it because they are assumptions!
Same here. It makes indies a one-man army again, like in the good old days before the complexity explosion of the 2010s.
Using claude-code for fixing bugs in a rather huge codebase. Reviews the fixes and if i think it wrote something I would make a pr off i use it. Understanding is key I think and giving it the right context. I have about 20 years of experience of programming and I’m letting it code in a domain and language I know very well. It saves me a lot when the bug requires finding a needle in a haystack.
This is one of its best use cases. Boilerplate and research, too. It’s also super handy for tweaking my Neovim config.
It's good. I use codex right now. I purposefully slow down to at least read/ review the code it generates , unless I'm creating something intentionally throw away. It helps me most dealing with languages and frameworks I'm not familiar with. I also use chatgpt as a rubber duck and although it's often too verbose I enjoy it. There are still many times where it will not provide the key insight to a problem but once you supply it it instantly agrees like it was always obvious. On the other hand it has helped me grok many subjects especially academic
For my specific niche (medical imaging) all current models still suck. The amount of expert knowledge required to understand the data and display it in the right way - probably never was in the training set.
We have this one performance-critical 3D reconstruction engine part, that just just has to go FAST through billions of voxels. From time to time we try to improve it, by just a bit. I have probably wasted at least 2 full days with various models trying out their suggestions for optimizations and benchmarking on real-world data. NONE produced an improvement. And the suggested changes look promising programming-wise, but all failed with real-world data.
These models just always want to help. Even if there is just no way to go, they will try to suggest something, just for the sake of it. I would just like the model to say "I do not know", or "This is also the best thing that I can come up with"... Niche/expert positions are still safe IMHO.
On the other hand - for writing REST with some simple business logic - it's a real time saver.
Did you feed back the results of the tests / benchmark to the model?
I’m presuming you have a very robust test framework / benchmark setup etc?
I’m presuming you fed the model the baseline results of that setup as a starting point ?
Most replies here are about writing code faster. But there's a gap nobody's talking about: AI agents are completely blind to running systems.
When you hit a runtime bug, the agent's only tool is "let me add a print statement and restart". That works for simple cases but it's the exact same log-and-restart loop we fall back to in cloud and containerized environments, just with faster typing.
Where it breaks down: timing-sensitive code, Docker services, anything where restarting changes the conditions you need to reproduce.
I've had debugging sessions where the agent burned through 10+ restart cycles on a bug that would've been obvious if it could just watch the live values.
We've given agents the ability to read and write code. We haven't given them the ability to observe running code. That's a pretty big gap.
easy, give the logs timestamps, the LLM can sort the order.
Timestamps aren't the issue. The problem is the cycle itself: stop the process, add the log line, restart, wait for the right conditions to hit that code path again. For anything timing-sensitive or dependent on external state, each restart changes what you're trying to observe.
I've used agents to look at traces, stack dumps, and have used them to control things like debuggers. I've had them exec into running containers and poke around. I've had them examine metrics, look into existing logs, look at pcaps, and more. Any kind of command I could type into a console they can do, and they can reason about the outputs of such a command.
In fact last night I had it hacking away at a Wordpress template. It was making changes and then checking screenshots from a browser window automatically confirming it's changes worked as planned.
That's close to what I'm thinking about. Curious what debugger setup you're using with agents - are you giving them access via MCP or just having them run CLI commands?
I'm mostly really enjoying it! While it's not my main job, I've always been a tool builder for teams I work on, so if I see a place where a little UI or utility would make people's life easier, I'd usually hack something together in a few hours and evolve it over time if people find it useful. That process is easily 10x faster than before.
My main work is training Text-to-Speech models, and the friction of experimenting with model features or ideas has dropped massively. If I want to add a new CFG implementation, or conditioning vector, 90% of the time Opus can one-shot it. It generally does a good job of making the model, inference and training changes simultaneously so everything plays nicely. Haven't had any major regressions or missed bugs yet, but we'll see!
The downside is reviewing shitty PRs where it's clear the engineer doesn't fully understand what they're doing, and just a general attitude of "I dunno, Claude suggested it" that's getting pretty exhausting.
For my job which is mostly YAML engineering with some light Go coding (Platform) I'm finding it useful. We're DRY-ing out a bunch of YAML with CUE at the moment and it's sped up that work up tremendously.
When it comes to personal projects I'm feeling extremely unmotivated. Things feel more in reach and I've probably built ten times the number of throwaway projects in the past year than I have in previous years. Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them. I have a feeling of 'what's the point' publishing these projects when the same code is only a few prompts away for someone else too. And publishing them under my name only cheapens the rest of my work which I put real cognitive effort into.
I think I want to focus more on developing knowledge and skills moving forward. Whatever I can produce with an LLM in a few hours is not actually valuable unless I'm providing some special insight, and I think I'm coming to terms with that at the moment.
> Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them
For me, this is a key differentiator between “AI-assisted” and “vibe-coded”. With the former, I may use AI in many ways: some code generation, review, bouncing ideas, or whatever. But I engage in every step, review and improve the generated code, disagree with the reviews (and still contribute a good proportion of hand-written code, at least in the core business logic). In this way I retain sufficient ownership over the output to feel it is my own.
With vibe-coding, I feel exactly as you describe it.
What in the world is YAML engineering?
I use ChatGPT to give me overview of some unfamiliar topics, suggest some architecture patterns, learn about common approach to solve X or refresh on some syntax. Sometimes there’s a repetitive task like applying same edit to a list of like 40 strings (e.g. surrounding them in struct init), and I found it useful to make ChatGPT do this. Summarising diffs in openapi, highlighting bugs and patterns in logs, documents also works pretty okay.
In my domain (signal processing, high load systems, embedded development, backend in Go) it doesn’t do great for coding tasks, and I’m very opposed to giving it lead to create files, do mass edits, et cetera. I found it to fail even on recent versions of Go, imagining interfaces, not knowing changes in some library interfaces (pre-2024 changes at least). Both ChatGPT and Claude failed to create proper application for me (parsing incoming messages and drawing real time graphics), both getting stucked at some point. Application worked more or less, but with bugs and huge performance issues.
I found it useful to quickly create skeletons for scripts/tools, that I can then fill up with actual logic, or making example of how a library is used.
So there is usability for me, it replaced stackoverflow and sometimes reading actual documentation.
I own a few repositories of our system, and contribution guides I create explicitly forbid use of LLMs and agents to create PRs. I had some experience with developers submitting vibe coded PRs and I do not want to waste my time on them anymore.
I had a couple of nice moments, like claude helping me with rust (which I don't understand) and claude finding a bug in a python library I was using
Also some not so nice moments (small rust changes were OK, but with a big one claude fumbled + I couldn't really verify that it worked so I didn't merge to code to master even when it seemingly worked)
I think it really helps to break the ice so to say. You no longer feel the tension, the pain of an empty page. You ask claude to write something, and improving something is so mentally easier
Also I mostly use claude as a spell checker / linter for the projects I'm too lazy to install proper tools for that. vim + claude, what else would you need
Luckily my company pays for the subscription, speding personal money on LLMs (especially on US LLMs) would feel strange for some reason. Ideally I want to own an LLM, have it at home but I am too lazy
For asking quick questions that would normally send me to a search engine, it’s pretty helpful. It’s also decent (most of the time) and throwing together some regex.
For throw away code, I might let the agent do some stuff. For example, we needed to test timing on DNS name resolution on a large number of systems to try and track down if that was causing our intermittent failures. I let an agent write that and was able to get results faster than if I did it myself, and I ultimately didn’t have to care about the how… I just needed something to show to the network team to prove it was their problem.
For larger projects that need to plugin to the legacy code base, which I’ll need to maintain for years, I still prefer to do things myself, using AI here and there as previously mentioned to help with little things. It can also help finding bugs more quickly (no more spending hours looking for a comma).
I had an agent refactor something I was making for a larger project. It did it, and it worked, but it didn’t write it in a way that made sense to my brain. I think others on my team would have also had trouble supporting it too. It took something relatively simple and added so many layers to it that it was hard to keep all the context in my head to make simple edits or explain to someone else how it worked. I might borrow some of the ideas it had, but will ultimately write my own solution that I think will be easier for other people to read and maintain.
Borrowing some of these ideas and doing it myself also allows me to continue to learn and grow, so I have more tools in my tool belt. With the DNS thing that was totally vibe coded, there were some new things in there I hadn’t done before. While the code made sense when I skimmed through it, I didn’t learn anything from that effort. I couldn’t do anything it did again without asking AI to do it again. Long-term, I think this would be a problem.
Other people on my team have been using AI to write their docs. This has been awful. Usually they don’t write anything at all, but at least then I know they didn’t writing anything. The AI docs are simply wrong, 100% hallucinations. I have to waste time checking the doc against the code to figure that out and then go to the person that did it to make them fix it. Sometimes no doc is better than a bad doc.
I’m a newer full stack engineer, previously did mostly web dev. It’s been useful in the areas that I’m not super interested in. We’re working on a 700KLOC legacy monolithic CRUD app with 0 documentation, it’s essentially the Wild West. We’ve found it very difficult to apply AI in a meaningful way (not just code output, reviews, documentation writing, automation). For a small team with lots to do on what is essentially a “keep the lights on” we’re in an interesting place, as it feels the infrastructure / codebase isn’t set up to handle newer tools.
I use the code generation heavily in my day to day, though verification is a priority for me, as is gaining an understanding of the business logic + improving my skills as a developer. There’s a healthy balance between deploying 100% generated code and not using the tools at all.
It’s useful for research tasks, identifying areas I’ll be working in when developing a feature. However, this team has a gigantic backlog and there are TONS of things we are behind on, so it does feel like AI isn’t moving the needle for us, though it is helpful. I’d like to apply it in different areas, but my senior engineer is very anti-AI, so he doesn’t find the tools useful and is actively against using them. Like I said, there’s surely a balance…
I see us using / relying on them more in the future, due to pressure from above, along with the general usefulness of them.
Like a lot of things, it’s neither and somewhere in the middle. It’s net useful even if just for code reviews that make you think about something you may have missed. I personally also use it to assist in feature development, but it’s not allowed to write or change anything unless I approve it (and I like to look at the diff for everything)
had a new one. one of my nontechnical managers generated some ESRI Arcade (toy language) code in ms copilot then called me for 3 hours so i could help them debug it in paired programming session. were consultants (im aware. lets not unpack that now) so a nice way to score 3 hours of billable work i might have had to generate from a client. honestly wouldn't mind being the "ai debugging guy" at my office. easy and mildly entertaining billable work
funny the manager thought they could shoot from the hip like that. wonder if they think its an effective pattern
(edit: grammer)
I develop prototypes using Claude Code. The dead boring stuff.
"Implement JWT token verification and role checking in Spring Boot. Secure some endpoints with Oauth2, some with API key, some public."
C# and Java are so old, whatever solutions you find are 5 years out of date. Having an agent implement and verify the foundation is the perfect fit. There's no design, just ever-chaning framework magic. I'd do the same "Google and debug" cycle, but 10 times slower.
It's kind of funny to see you saying "whatever solution you find are 5 years out of date", while at the same time saying that the tool that was taught using those same 5 years out of date solutions as a part of its training data is actually good.
Terrible idea if you ask me. I'd suggest checking the official docs next time around, or at the very least copying them into the context window.
First, good agents do that themselves. Second, specifying an exact and current version also works. Third, I'm mostly concerned about having a working example. I'm talking about breaking changes and APIs not existing in newer framework version. As long as it compiles, it's clear the approach still works.
Well then your experience is not really relevant in this thread when the prompt is specifically asking for professional coding work now, is it?
You're not an LLM (at least I don't think you are), you're not obliged to respond with an answer even when that answer is only tangentially related to the prompt.
I am required to maximise my use of AI at work and so I do. It's good enough at simple, common stuff. Throw up a web page, write some python, munge some data in C++, all great as long as the scale is small. If I'm working on anything cutting edge or niche (which I usually am) then it makes a huge mess and wastes my time. If you have a really big code base in the ~50million loc range then it makes a huge mess.
I really liked writing code, so this is all a big negative for me. I genuinely think we have built a really bad thing, that will take away jobs that people love and leave nothing but mediocrity. This thing is going to make the human race dumber and it's going to hold us back.
I work at a company that maintains one of the largest Rails codebases in the world (their claim, but believable). My experience has been the opposite - Claude and Cursor have done a wonderful job of helping me understand the implement new features in this gigantic codebase. I actually found out through AI that while I enjoy writing code, I enjoy building great software better, the coding was just a means to the end.
At my FAANG, there was a team of experienced engineers that proved they could deliver faster and more performant code than a complete org that was responsible for it earlier.
So now a lot of different parts of the company are trying to replicate their workflow. The process is showing what works, you need to have AI first documentation (readme with one line for each file to help manage context), develop skills and steering docs for your codebase, code style, etc,. And it mostly works!
For me personally, it has drastically increased productivity. I can pick up something from our infinitely huge backlog, provide some context and let the agent go ham on fixing it while i do whatever other stuff is assigned to me.
1. Generate unit tests beyond the best-case scenario. Analogy: Netflix's Chaos Monkey
2. Incremental cleanup: I also use it as a fancier upgrade of Visual Studio's Code Analysis feature and aid me in finding areas to refactor.
3. Treating the model as a corpus of prior knowledge and discussions, I can form a 'committee of agents' (Security, Reliability, UX engineer POVs) to help me view my work at a more strategic level.
My additional twist to this is to check against my organization's mission statement. That way, I hope I can help reduce mission drift that I observe was a big issue behind dysfunctional companies.
I use it mostly to explore the information space of architectural problems, but the constant "positive engagement feedback" (first line of each generation "brilliant insight") start being deeply insincere and also false by regularly claiming "this is the mathematically best solution - ready to implement ?" only that it isn't when considering it truly.
I have moved away from using an LLM now before having figured out the specifications, otherwise it's very very risky to go down a wrong rabbit hole the LLM seduced you into via its "user engagement" training.
It’s a fantastic performance booster for a lot of mundane tasks like writing and revising design docs, tests, debugging (using it like a super smart and active rubber duck), and system design discussions.
I also use it as a final check on all my manually written code before sending it for code review.
With all that said, I have this weird feeling that my ability to quickly understand and write code is no longer noticeable, nor necessary.
Everyone now ships tons of code and even if I do the same without any LLM, the default perception will be that it has been generated.
I am not depressed about it yet, but it will surely take a while to embrace the new reality in its entirety
For debugging, it’s also great trawling through logs and stack traces.
Makes a late night oncall page way easier when the bot will tell you exactly what broke
Mixed feelings.
I use Opus 4.6 with pi.dev (one agent). I give detailed instructions what to do. I essentially use it to implement as I do it manually. Small commits, every commit tested and verified. I don’t use plan mode, just let the agent code - code review is faster than reading plan. This approach works only if you make small changes. I create mental model of the code same way, as when writing it manually.
Some people on my team codes with AI without reading code. That’s mostly a mess. No mental model, lower quality. They are really proud of it though and think they are really smart. Not sure how this will turn out.
Define "professional."
I write stuff for free. It's definitely "professional grade," and lots of people use the stuff I ship, but I don't earn anything for it.
I use AI every day, but I don't think that it is in the way that people here use it.
I use it as a "coding partner" (chat interface).
It has accelerated my work 100X. The quality seems OK. I have had to learn to step back, and let the LLM write stuff the way that it wants, but if I do that, and perform minimal changes to what it gives me, the results are great.
It's great. I'd guess 80-90% of my code is produced in Copilot CLI sessions since the beginning of the year. Copilot CLI is worse than Claude Code, but not by a huge amount. This is mostly working in established 100k+ LOC codebases in C# and TypeScript, with a couple greenfield new projects. I have to write more code by hand in the greenfield projects at their formative stage; LLMs do better following conventions in an existing codebase than being consistent in a new one.
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
It churns through boring stuff but it's like I imagine the intellectual equivalent of breaking in a wild horse at times, so capable, so fast but easy to find yourself in a pile on the floor.
I'm learning all the time and it's fun, exasperating, tremendously empowering and very definitely a new world.
Daily Claude user via Cursor.
What works:
-Just pasting the error and askig what's going on here.
-"How do I X in Y considering Z?"
-Single-use scripts.
-Tab (most of the time), although that doesn't seem to be Claude.
What doesn't:
-Asking it to actually code. It's not going to do the whole thing and even if, it will take shortcuts, occasionally removing legitimate parts of the application.
-Tests. Obvious cases it can handle, but once you reach a certain threshold of coverage, it starts producing nonsense.
Overall, it's amazing at pattern matching, but doesn't actually understand what it's doing. I had a coworker like this - same vibe.
Opus 4.5 max (1m tokens) and above were the tipping point for me, before that, I agree with 100% of what you said.
But even with Opus 4.6 max / GPT 5.4 high it takes time, you need to provide the right context, add skills / subagents, include tribal knowledge, have a clear workflow, just like you onboard a new developer. But once you get there, you can definitely get it to do larger and larger tasks, and you definitely get (at least the illusion) that it "understands" that it's doing.
It's not perfect, but definitely can code entire features, that pass rigorous code review (by more than one human + security scanners + several AI code reviewers that review every single line and ensure the author also understands what they wrote)
I don’t use AI to generate any code, but I have used a few tools sparingly as such:
1. Gemini as a replacement for Stack Overflow, but I always have to check the source because it sometimes gives examples that 10 or even 15+ years old, as if that’s a definitive answer. We cannot and should not trust that anything AI produces is correct.
2. Co-Pilot to assist in code snippets and suggestions, like a better Intellisense. Comes in handy for CLI tools such as docker compose, etc.
3. Co-Pilot to help comprehension of a code base. For example, to ask how a particular component works or to search for the meaning of a term of reference to it, especially if the term is vague or known by another name.
Believe it or not, we have just recently received guidance on AI-assisted work in general, and it’s mostly “it’s ok to use AI, but always verify it”, which of course seems completely reasonable, as you should do this with any work that you wouldn’t have done yourself.
On 1. gemini (et al) is not replacing stack overflow. its just regurgitating content it ingeated from stack overflow.
while SO allowed for new answers to show up, any new nextjs bug i ask about that is not yet common place on SO, i get some allucionation telling me to use some made up code api based on the github issue discussion.
Like many others I started feeling it had legs during the past few months. Tools and models reached some level where it suddenly started living up to some of the hype.
I'm still learning how to make the most of it but my current state is one of total amazement. I can't believe how well this works now.
One game-changer has been custom agents and agent orchestration, where you let agents kick off other agents and each one is customized and keeps a memory log. This lets me make several 1000 loc features in large existing codebases without reaching context limits, and with documentation that lets me review the work with some confidence.
I have delivered several features in large legacy codebases that were implemented while I attended meetings. Agents have created greenfield dashboards, admin consoles and such from scratch that would have taken me days to do myself, during daily standups. If it turned out bad, I tweaked the request and made another attempt over lunch. Several useful tools have been made that save me hours per week but I never took the time to make myself.
For now, I love it. I do feel a bit of "mourning the craft" but love seeing things be realized in hours instead of days or weeks.
On two greenfield web apps using straightforward stuff (Preact, Go, PostgreSQL) Claude Code has been very helpful. Especially with Claude Code and Opus >= 4.5, adding an incremental feature mostly just works. One of these is sort of a weird IDE, and Opus even does OK with obscureish things like CodeMirror grammars. I literally just write a little paragraph describing what I want, have it write the code and tests, give it a quick review, and 80% of the time it’s like, great, no notes.
To be clear, this is not vibecoding. I have a strong sense of the architecture I want, and explicitly keep Claude on the desired path much like I would a junior programmer. I also insist on sensible unit and E2E test coverage with every incremental commit.
I will say that after several months of this the signalling between UI components is getting a bit spaghettilike, but that would’ve happened anyway, and I bet Claude will be good at restructuring it when I get around to that.
I also work in a giant Rails monolith with 15 years of accumulated cruft. In that area, I don’t write a whole lot, but CC Opus 4.6 is fantastic for reading the code. Like, ask “what are all the ways you can authenticate an API endpoint?” and it churns away for 5 minutes and writes a nice summary of all four that it found, what uses them, where they’re implemented, etc.
Same attitudes as others in this thread.
For personal projects and side company, I get to join in on some of the fun and really multiply the amount of work I can get through. I tend to like to iterate on a project or code base for awhile, thinking about it and then tearing things down and rebuilding it until I arrive at what I think is a good implementation. Claude Code has been a really great companion for this. I'd wager that we're going to see a new cohort of successful small or solo-founder companies that come around because of tools like this.
For work, I would say 60% of my company's AI usage is probably useless. Lots of churning out code and documents that generate no real value or are never used a second time. I get the sense that the often claimed "10x more productive" is not actually that, and we are creating a whole flood of problems and technical debt that we won't be able to prompt ourselves out of. The benefit I have mostly seen myself so far is freeing up time and automating tedious tasks and grunt work.
At work I mostly use claude code and chatgpt web for general queries, but cursor is probably the most popular in our company. I don’t think we are "cooked" but it definitely changes how development will be done. I think the process of coming up with solutions will still be there but implementation is much faster now.
My observations:
1. What works for me is the usual, work iteratively on a plan then implement and review. The more constraints I put into the plan the better.
2. The biggest problem for me is LLM assuming something wrong and then having to steer it back or redoing the plan.
3. Exploring and onboarding to new codebases is much faster.
4. I don’t see the 10x speedup but I do see that now I can discard and prototype ideas quickly. For example I don’t spend 20-30 minutes writing something just to revert it if I don’t like how it looks or works.
5. Mental exhaustion when working on multiple different projects/agent sessions is real, so I tend to only have one. Having to constantly switch mental model of a problem is much more draining than the “old” way of working on a single problem. Basically the more I give in into vibing the harder it is to review and understand.
It's working fine for me.
I'm lucky enough to have upper management not pressuring to use it this or that way, and I'm using mostly to assist with programming languages/frameworks I'm not familiar with. Also, test cases (these sometimes comes wrong and I need to review thoroughly), updating documentation, my rubber duck, and some other repetitive/time consuming tasks.
Sometimes, if I have a simple, self-contained bug scenario where extensive debug won't be required, I ask it to find the reason. I have a high rate of success here.
However, it will not help you with avoiding anti-patterns. If you introduce one, it will indulge instead of pointing the problem out.
I did give it a shot on full vibe-coding a library into production code, and the experience was successful; I'm using the library - https://youtu.be/wRpRFM6dpuc
I had automation setup for anything I needed for work, gen AI made me feel like I had to babysit a dumb junior developer so I lost interest
Managment uses it to make mock websites then doesn't listen when we point out flows, so nothing new there
Some in digital marketing are using it for data collection/anlysis, but it reaches wrong conclusions 50% of the time (their words) so they are slowly dropping it and using it for meneal tasks and simple automations
In design we had a trial period but has the same issue as coding: either it makes something a senior designer could have made in 2 minutes or it introduces errors that take a long time to fix, to then do it again the next prompt
we are a senior dev team, although relative small, and to me it seems like it only really works as a subsitute for junior devs... but the point of junior devs is to grow someone into a senior with the knowledge you need in the company so i don't really get the usecase overall
I just moved to a new team in my company that prides itself on being "AI-First". The work is a relatively new project that was stood up by a small team of two developers (both of whom seem pretty smart) in the last 4 months. Both acknowledged that some parts of their tech stack, they just don't at all understand (next.js frontend). The backend is a gigantic monorepo of services glued together.
The manager & a senior dev on my first day told me to "Don't try to write code yourself, you should be using AI". I got encouraged to use spec-driven development and frameworks like superpowers, gsd, etc.
I'm definitely moving faster using AI in this way, but I legitimately have no idea what the fuck I am doing. I'm making PRs I don't know shit about, I don't understand how it works because there is an emphasis on speed, so instead of ramping up in a languages / technologies I've never used, I'm just shipping a ton of code I didn't write and have no real way to vet like someone who has been working with it regularly and actually has mastered it.
This time last year, I was still using AI, but using it as a pair programming utility, where I got help learn to things I don't know, probe topics / concepts I need exposure to, and reason through problems that arose.
I can't control the direction of how these tools are going to evolve & be used, but I would love if someone could explain to me how I can continue to grow if this actually is the future of development. Because while I am faster, the hope seems to be AI / Agents / LLMs will only ever get better and I will never need to have an original thought or use crtical thinking.
I have just about 4 years of professional experience. I had about 10 - 12 months of the start of my career where I used google to learn things before LLMs became sole singular focus.
I wake up every day with existential dread of what the future looks like.
A new way of operating is forced down your throat due to expectations of how the technology will evolve. What actually happens is highly variable - on the spectrum between a huge positive and negative surprise.
The people forcing it down you do not care about the long-term ramifications.
This app sounds destined for total disaster.
Tools: Claude Code and various VS Code derivatives, and Cursor at work. Generally Opus 4.6 now.
I feel it made me better and other people worse.
GOOD:
I feel that I’m producing more and better code even with unfamiliar and tangled codebases. For my own side projects, it’s brought them from vague ideas to shipped.
I can even do analyses I never could otherwise. On Friday I converted my extensive unit test suite into a textual simulation of what messages it would show in many situations and caught some UX bugs that way.
Cursor’s Bugbot is genuinely helpful, though it can be irritatingly inconsistent. Sometimes on round 3 with Bugbot it suddenly notices something that was there all along. Or because I touch a few lines of a library suddenly all edge cases in that library are my fault.
NOT GOOD:
The effect on my colleagues is not good. They are not reading what they are creating. I get PRs that include custom circular dependency breakers because the LLM introduced a circular dependency, and decided that was the best solution. The ostensible developer has no idea this happened and doesn’t even know what a circular dependency breaker is.
Another colleague does an experiment to prove that something is possible and I am tasked to implement it. The experiment consists of thousands of lines of code. After I dig into it I realize the code is assuming that something magically happened and reports it’s possible.
I was reflecting on this and realized the main difference between me and my current team is that I won’t commit code I don’t understand. So I even use the LLMs to do refactors just for clarity. while sometimes my colleagues are creating 500-line methods.
Meanwhile our leaders are working on the problem of code review because they feel it’s the bottleneck. They want to make some custom tools but I suspect they are going to be vastly inferior to the tools coming from the major LLM providers. Or maybe we’ll close the loop and we won’t even be reviewing code any more.
For professional work, I like to offload some annoying bug fixes to Claude and let it figure it out. Then, perusing the changes to make sure nothing silly is being added to the codebase. Sometimes it works pretty well. Other times, for complicated things I need to step in and manually patch. Overall, I'm a lot less stressed about meeting deadlines and being productive at work. On the other hand, I'm more stressed about losing my employment due to AI hype and its effectiveness.
For my side projects, I do like to offload the tedious steps like setup, scaffolding or updating tasks to Claude. Things like weird build or compile errors that I usually would have to spend hours Googling to figure out I can get sorted in a matter of minutes. Other than that, I still like to write my own code as I enjoy doing it.
Overall, I like it as a tool to assist in my work. What I dislike is how much peddling is being done to shove AI into everything.
As crazy as this seems, it's unlocking another variation of software engineering I didn't think was accessible. Previously, super entrenched and wicked expensive systems that might have taken years of engineering effort, appear to be ripe for disruption suddenly. The era of software systems with deeply engineered connectivity seem to be on the outs...
Two contexts:
1. Workplace, where I work on a lot of legacy code for a crusty old CRM package (Saleslogix/Infor), and a lot of SQL integration code between legacy systems (System21).
So far I've avoided using AI generated code here simply because the AI tools won't know the rules and internal functions of these sets of software, so the time wrangling them into an understanding would mitigate any benefits.
In theory where available I could probably feed a chunk of the documentation into an agent and get some kind of sensible output, but that's a lot of context to have to provide, and in some cases such documentation doesn't exist at all, so I'd have to write it all up myself - and would probably get quasi hallucinatory output as a reward for my efforts.
2. Personally where I've been working on an indie game in Unity for four years. Fairly heavy code base - uses ECS, burst, job system, etc. From what I've seen AI agents will hallucinate too much with those newer packages - they get confused about how to apply them correctly.
A lot of the code's pretty carefully tuned for performance (thousands of active NPCs in game), which is also an area I don't trust AI coding at all, given it's a conglomeration of 'average code in the wild that ended up in the training set'.
At most I sometimes use it for rubber ducking or performance. For example at one point I needed a function to calculate the point in time at which two circles would collide (for npc steering and avoidance), and it can be helpful to give you some grasp of the necessary math. But I'll generally still re-write the output by hand to tune it and make sure I fully grok it.
Also tried to use it recently to generate additional pixel art in a consistent style with the large amount of art I already have. Results fell pretty far short unfortunately - there's only a couple of pixel art based models/services out there and they're not up to snuff.
I am having the greatest time professionally with AI coding. I now have the engineering team I’ve always dreamed of. In the last 2 months I have created:
- a web-based app for a F500 client for a workflow they’ve been trying to build for 2 years; won the contract
- built an iPad app for same client for their sales teams to use
- built the engineering agent platform that I’m going to raise funding
- a side project to do rough cuts of family travel videos (https://usefirstcut.com, soft launch video: https://x.com/xitijpatel/status/2026025051573686429)
I see a lot of people in this thread struggling with AI coding at work. I think my platform is going to save you. The existing tools don’t work anymore, we need to think differently. That said, the old engineering principles still work; heck, they work even better now.
> - a side project to do rough cuts of family travel videos (https://usefirstcut.com, soft launch video: https://x.com/xitijpatel/status/2026025051573686429)
I can't comment about the quality of the code you delivered for your client so I checked your side project. Unfortunately it looks like there is only a landing page (very nice!) but the way from a vibe-coded project to production is usually quite long.
Not wrong at all, that’s why I’m building my own platform for this. That’s also why I haven’t publicly done much on First Cut yet. I’m using my platform to actually build the product, so the intent is that I use my expertise and oversight to ensure it’s not just slop code. So most of the effort has gone into building that platform, which has made building First Cut itself slower. But I’ve actually got my platform running well-enough that now my team is able to get involved, and I can start to work on First Cut again, which means that I should be able to answer your “concern” definitively. I share it.
I’ve been a web dev for 10+ years, and my professional pivot in 2026 has been moving away from "content-first" sites to "tool-led" content products. My current stack is Astro/Next.js + Tailwind + TypeScript, with heavy Python usage for data enrichment.
What’s working:
Boilerplate & Layout Shifting: AI (specifically Claude 4.x/5) is excellent for generating Astro components and complex Tailwind layouts. What used to take 2 hours of tweaking CSS now takes 15 minutes of prompt-driven iteration.
Programmatic SEO (pSEO) Analysis: I use Python scripts to feed raw data into LLMs to generate high-volume, structured analysis (300+ words per page). For zero-weight niche sites, this has been a massive leverage point for driving organic traffic.
Logic "Vibe Checks": When building strategy engines (like simulators for complex games), I use AI to stress-test my decision-making logic. It’s not about writing the core engine—which it still struggles with for deep strategy—but about finding edge cases in my "Win Condition" algorithms.
The Challenges:
The "Fragment" Syntax Trap: In Astro specifically, I’ve hit issues where AI misidentifies <> shorthand or hallucinates attribute assignments on fragments. You still need to know the spec inside out to catch these.
Context Rot: As a project grows, the "context window" isn't the problem; it's the "logic drift." If you let the AI handle too many small refactors without manual oversight, the codebase becomes a graveyard of "almost-working" abstractions.
The Solution: I treat AI as a junior dev who is incredibly fast but lacks a "mental model" of the project's soul. I handle the architecture and the "strategy logic," while the AI handles the implementation of UI components and repetitive data transformations.
Stack: Astro, TypeScript, Python scripts for data. Experience: 10 years, independent/solo.
I only just started using it at work in the last month.
I am a data engineer maintaining a big data Spark cluster as well as a dozen Postgres instances - all self hosted.
I must confess it has made me extremely productive if we measure in terms of writing code. I don't even do a lot of special AGENTS.md/CLAUDE.md shenanigans, I just prompt CC, work on a plan, and then manually review the changes as it implements it.
Needless to say this process only works well because: A) I understand my code base. B) I have a mental structure of how I want to implement it.
Hence it is easy to keep the model and me in sync about what's happening.
For other aspects of my job I occasionally run questions by GPT/Gemini as a brainstorming partner, but it seems a lot less reliable. I only use it as a sounding board. I does not seem to make me any more effective at my job than simply reading documents or browsing github issues/stack overflow myself.
I use Gemini, and rarely ChatGPT (usually once or twice a day). I ask very narrow, pointed questions about something specific I would like an answer to. I typically will verify that the solution is good/accurate because I've been burned in the past by receiving what I'd characterize as a bad solution or "wrong" answer.
I think it's useful tool, but whenever I have a LLM attempt to develop an entire feature for me, the solution becomes to a pain to maintain (because I don't have the mental model around it or the solution has subtle issues).
Maybe people who are really deep into using AI are using Claude? Perhaps it's way better, I don't know.
I use it as a research tool.
What it has done is replace my Googling and asking people looking up stuff on stack over flow.
Its also good for generating small boiler plate code.
I don't use the whole agents thing and there are so many edge cases that I always need to understand & be aware of that the AI honestly think cannot capture
Solo dev, working on a native macOS app in SwiftUI. AI has been most useful for the boilerplate - repetitive view layouts, FileManager calls, figuring out AppKit bridging weirdness. It basically replaced Stack Overflow for me.
Where it breaks down is state management. The suggestions look right but introduce subtle bugs in how data flows between views. I've learned to only use it for isolated, well-scoped tasks. Anything that touches multiple components, I write myself.
I have mostly been using the Claude Sonnet models as they release each new one.
It is great for getting an overview on a pile of code that I'm not familiar with.
It has debugged some simple little problems I've had, eg, a complex regex isn't behaving so I'll give it the regex and a sample string and ask, "why isn't this matching" and it will figure out out.
I've used it only a little for writing new code. In those cases I will write the shell of a subroutine and a comment saying what the subroutine takes in and what it returns, then ask the LLM to fill in the body. Then I review it.
It has been useful for translating ancient perl scripts into something more modern, like python.
It’s going very well.
Experience level: very senior, programming for 25 years, have managed platform teams at Heroku and Segment.
Project type: new startup started Jan ‘26 at https://housecat.com. Pitch is “dev tools for non developers”
Team size: currently 2.
Stack: Go, vanilla HTML/CSS/JS, Postgres, SQLite, GCP and exe.dev.
Claude code and other coding harnesses fully replaced typing code in an IDE over the past year for me.
I’ve tried so many tools. Cursor, Claude and Codex, open source coding agents, Conductor, building my own CLIs and online dev environments. Tool churn is a challenge but it pays dividends to keep trying things as there have been major step functions in productivity and multi tasking. I value the HN community for helping me discover and cut through the space.
Multiple VMs available over with SSH with an LLM pre-configured has been the latest level up.
Coding is still hard work designing tests, steering agents, reviewing code, and splitting up PRs. I still use every bit of my experience every day and feel tired at end of day.
My non-programmer co-founder, more of a product manager and biz ops person, has challenges all the time. He generally can only write functional prototypes. We solve this by embracing the functional prototype and doing a lot of pair programming. It is much more productive than design docs or Figma wireframes.
In general the game changer is how much a couple of people can get done. We’re able to prototype ideas, build the real app, manage SOC2 infra, marketing and go to market better than ever thanks to the “willing interns” we have. I’ve done all this before and the AI helps with so much of the boilerplate and busywork.
I’m looking for beta testers and security researchers for the product, as well as a full time engineer if anyone is interested in seeing what a “greenfield” product, engineering culture and business looks like in 2026. Contact info in my profile.
Interesting premise for your product. Hope you find success! From a dev perspective I feel your website pass a vibe more of a "OpenClaw you can trust" than "dev tool for non developers". Is that right? Or am I misreading the idea?
Thanks. Yes that’s a proper take.
The OpenClaw stuff is awesome but it’s too raw for a lot of professionals and small teams. We’re trying to bring more guardrails to the concept and more of a Ruby on Rails philosophy to how it works.
I'm enjoying it. At this stage though, I just don't see much value if you don't have any prior knowledge of what you're doing. Of course you can use LLMs to get better at it but we're not yet at the point where I'd trust them to build something complex without supervision... nor is anyone suggesting that, except AI CEOs :)
I do wonder what will happen when real costs are billed. It might end up being a net positive since that will make you think more about what you prompt, and perhaps the results will be much better than lazily prompting and seeing what comes out (which seems to be a very typical case).
I just wonder if there are comments in this thread from anthropic bots, marketing itself
At $WORK, my team is relatively small (< 10 people) and a few people really invested in getting the codebase (a large Elixir application with > 3000 modules) in shape for AI-assisted development with a very comprehensive set of skills, and some additional tooling.
It works really well (using Claude Code and Opus 4.6 primarily). Incremental changes tend to be well done and mostly one-shotted provided I use plan mode first, and larger changes are achievable by careful planning with split phases.
We have skills that map to different team roles, and 5 different skills used for code review. This usually gets you 90% there before opening a PR.
Adopting the tool made me more ambitious, in the sense that it lets me try approaches I would normally discard because of gaps in my knowledge and expertise. This doesn't mean blindly offloading work, but rather isolating parts where I can confidently assess risk, and then proceed with radically different implementations guided by metrics. For example, we needed to have a way to extract redlines from PDF documents, and in a couple of days went from a prototype with embedded Python to an embedded Rust version with a robust test oracle against hundreds of document.
I don't have multiple agents running at the same time working on different worktrees, as I find that distracting. When the agent is implementing I usually still think about the problem at hand and consider other angles that end up in subsequent revisions.
Other things I've tried which work well: share an Obsidian note with the agent, and collaboratively iterate on it while working on a bug investigation.
I still write a percentage of code by hand when I need to clearly visualise the implementation in my head (e.g. if I'm working on some algo improvement), or if the agent loses its way halfway through because they're just spitballing ideas without much grounding (rare occurrence).
I find Elixir very well suited for AI-assisted development because it's a relatively small language with strong idioms.
This exactly matches our findings: if we start molding the repo to be "AI native" whatever that means, add the right tooling and still demand all engineers take full responsibility for their output, this system is a true multiplier.
I also have Copilot and Cursor bugbot reviews and run it on a Ralph wiggum loop with claude code. A few rounds overnight and the PR is perfect and ready for a final review before merging.
I do run 4 CC sessions in parallel though, but thats just one day a week. The rest of the week is spent figuring out the next set of features and fixes needed, operational things, meetings,feedback, etc.
Small team, backend. NDAs prevent widespread LLM use, but some of our engineers, junior and senior both, feel pretty confident in using Claude for "isolated" development, like generic packages or libraries that are plausibly unrelated to our work.
It's going very poorly, where the engineers are emboldened by speed and are vacating their normal code-review responsibilities. I would also say they are shirking ethical behavior by domineering other people's time, energy, and open source projects. Moreover, these forays into generic packages are largely vanity projects, an excuse to play with LLMs.
My only solution is to increase my level of code-review, which aggravates everybody involved, including me. It is not a good solution.
I could definitely perceive hardline rules being valuable surrounding LLM use (e.g. "LLM PRs must be less that n logical statements, no exceptions", is just one example rule off the top of my head), especially if the LLM can be made to stridently follow those rules, but the idea of hashing those out sounds unproductive.
Has been a game-changer for me. The following cases are where it shines:
- Figuring out the architecture of a project you just came into
- Tracing the root cause of a bug
- Quickly implementing a solution with known architecture
I figured out that above all, what makes or breaks success is context engineering. Keeping your project and session documentation in order, documenting every learning you've made along the way (with the help of AI), asking AI to compose a plan before implementing it, iterating on a plan before it looks good to you. Sometimes I spend several hours on a plan markdown document, iterating on it with AI, before pressing "Build" button and the AI doing it in 10 minutes.
Another important thing is verification harness. Tell the agent how to compile the code, run the tests - that way it's less likely to go off the rails.
Overall, since couple of month ago, I feel like I got rid of the part of programming that I liked the least - swimming in technicalities irrelevant for the overall project's objectives - while keeping what I liked the most - making the actual architectural and business decisions.
I wrote a blog recently about the approach that works for me: https://anatoliikmt.me/posts/2026-03-02-ai-dev-flow/
And this is a tool for context engineering I made specifically to support such a flow: https://ctxlayer.dev/
Sometimes it produces useful output. A good base of tests to start with. Or some little tool I'd never take the time to make if I had to do it myself.
On the other hand I tried to get help debugging a test failure and Claude spit out paragraph after paragraph arguing with itself going back and forth. Not only did it not help none of the intermediate explanations were useful either. It ended up being a waste of time. If I didn't know that I could have easily been sent on multiple wild goose chases.
We're using Augment Code heavily on a "full rewrite of legacy CRM with 30 years of business rules/data" Laravel project with a team size of 4. Augment kind of became impossible to avoid once we realized the new guy is outpacing the rest of us while posessing almost no knowledge of code and working fully in the business requirements domain, extracting requirements from the customer and passing them to AI, which was encoding them in tests and implementing them in code.
I'm using `auggie` which is their CLI-based agentic tool. (They also have a VS Code integration - that became too slow and hung often the more I used it.) I don't use any prompting tricks, I just kind of steer the agent to the desired outcome by chatting to it, and switch models as needed (Sonnet 4.6 for speed and execution, GPT 5.1 for comprehension and planning).
My favorite recent interaction with Augment was to have one session write a small API and its specification within the old codebase, then have another session implement the API client entirely from the specification. As I discovered edge cases I had the first agent document them in the spec and the second agent read the updated spec and adjust the implementation. That worked much, much better than the usual ad hoc back and forth directly between me and one agent and also created a concise specification that can be tracked in the repo as documentation for humans and context for future agentic work.
Why GPT 5.1?
Far better results compared to GPT 5.4 and Opus 4.6. Not great for execution due to speed but has consistently had better comprehension of the codebase. Maybe it's a case of "holding it wrong" regarding the other models but that's been my experience.
So I am a software engineer at Microsoft, so we have been using the Github Copilot very regularly. It gives us unlimited Opus credits.
The good thing is that the work gets much quicker than before, and it's actually a boon for that
The issue is inflated expectations
For example: If a work item ideally would take two weeks before AI, it is expected now to be done in like 2 days.
So we still need to find a sweet spot so that the expectations are not unbelievable.
MS is a mature place so they're still working on it and take our feedback seriously. at least that's what I have seen
> If a work item ideally would take two weeks before AI, it is expected now to be done in like 2 days.
> MS is a mature place so they're still working on it and take our feedback seriously. at least that's what I have seen
Suggested feedback then, for them to take seriously: "your expectations are inflated".
Hating it TBH. I feel like it took away a lot of what I enjoyed about programming but its often so effective and I'm under so much pressure to be fast that I can't not use it.
I quit my job and went out on my own freelancing.
So far, it's been fantastic. I can do more things for clients, much faster, than I ever dreamed would be possible when I've attempted work like this before.
I think the biggest problem with AI coding is that it simply doesn't fit well into existing enterprise structures. I couldn't imagine being able to do anything productive when I'm stuck having to rely on other teams or request access to stuff from the internet like I did in previous jobs.
I've been working on a client server unity based game the last couple of years. It's pretty bad at handling that use case. It misses tons of corner cases that span the client server divide.
When you prompt does the agent have access/visibility to all code bases/repos at once and do you prompt it to update both at the same time? That has worked well for me for client/server stuff.
At the end of the day, I'm being paid to ensure that the code deployed to production meets a particular bar of quality. Regardless of whether I'm reviewing code or writing it, If I let a commit be merged, I have to be convinced that it is a net positive to the codebase.
People having easy access to LLMs makes this job much harder. LLMs can create what looks at the surface like expert-written code, but suffers from below-the-surface issues that will reveal themselves as intermittent issues or subtle bugs after being deployed.
Inexperienced devs create huge commits full of such code, and then expect me to waste an entire day searching for such issues, which is miserable.
If the models don't improve significantly in the future, I expect that most high-stakes software teams will fire all the inexperienced devs and have super-experienced engineers work with the bots directly.
AI is great for getting stuff to work on technologies you're not familiar with. E.g. to write an Android or iOS app or an OpenGL shader, or even a Linux driver. It's also great for sysadmin work such as getting an ethernet connection up, or installing a docker container.
For main coding tasks, it is imho not suitable because you still have to read the code and I hate reading other people's code.
And also, the AI is still slow, so it is hard to stay focused on a task.
Models aren’t reliable, and it’s a bottleneck.
My solution was to write code to force the model down a deterministic path.
It’s open source here: https://codeleash.dev
It’s working! ~200k LOC python/typescript codebase built from scratch as I’ve grown out the framework. I probably wrote 500-1000 lines of that, so ~99.5% written by Claude Code. I commit 10k-30k loc per week, code-reviewed and industrial strength quality (mainly thanks to rigid TDD)
I review every line of code but the TDD enforcement and self-reflection have now put both the process and continual improvement to said process more or less on autopilot.
It’s a software factory - I don’t build software any more, I walk around the machine with a clipboard optimizing and fixing constraints. My job is to input the specs and prompts and give the factory its best chance of producing a high quality result, then QA that for release.
I keep my operational burden minimal by using managed platforms - more info in the framework.
One caveat; I am a solo dev; my cofounder isn’t writing code. So I can’t speak to how it is to be in a team of engineers with this stuff.
My most productive day last week was a net of -10k lines (yes, minus ten thousand).
No AI used.
Congratulations, honestly, but I would not do that for a job.
Metaphorically speaking, you’re out there sprinting on the road while people who’ve made agentic coding work for them are sipping coffee in a limo.
People who haven’t made agentic coding work (but do it anyway) are sipping coffee in the back of a limo that has no brakes. No thanks to that.
You have a 200K LOC repository and you haven’t written 99.5% of it?
It was generated for me in accordance with the architecture and constraints I defined for the agent; and I’ve reviewed every line.
TDD really is that good.
Right now I enjoy the labs' cli harnesses, Claude Code, and Codex (especially for review). I do a bunch of niche stuff with Pi and OpenCode. My productivity is up. Some nuances with working with others using the same AI tools- we all end up trying to boil the ocean at first- creating a ton of verbose docs and massive PRs, but I/they end up regressing on throwing up every sort of LLM output we get. Instead, we continously refine the outputs in a consumable+trusted way.
My workday is fairly simple. I spend all day planning and reviewing.
1. For most features, unless it's small things, I will enter plan mode.
2. We will iterate on planning. I built a tool for this, and it seems that this is a fairly desired workflow, given the popularity through organic growth. https://github.com/backnotprop/plannotator
3. After plan's approved, we hit eventual review of implementation. I'll use AI reviewers, but I will also manually review using the same tool so that I can create annotations and iterate through a feedback loop with the agents.4. Do a lot of this / multitasking with worktrees now.
Worktrees weren't something I truly understood the value of for a while, until a couple weeks ago, embarrassingly enough: https://backnotprop.com/blog/simplifying-git-worktrees/
I've been working on a thing for worktrees to work with docker-compose setups so you can run multiple localhost environments at once https://coasts.dev/. It's free and open source. In my experience it's made worktrees 10x better but would love to hear what other folks are doing about things like port conflicts and db isolation.
Worktrees slap.
I am no longer in software as a day job so i am not sure of my input applys. I traded that world for opening a small brewery back in 2013. So I am a bit outdated on many modern trends but I still enjoy programming. In the last fee months using both gemeni and now movong over to claude, I have created at least 5 (and growing) small apps that have radically transformed what i am able to do at the business. I totally improved automation of my bookkeeping (~16hrs a month categorizing everything down to 3ish), created an immense amount of better reports on production, sales and predictions from a system i had already been slowly writing all these years, I created a run club rewards tracking app instead of relying on our paper method, improved upon a previously written full tv menu display system that syncs with our website and on premis tvs and now i am working on a full productive maintenance trigger system and a personal phone app to trigger each of these more easily. Its been a game changer for me. I have so many more ideas planned and each one frees up more of my waste time to create more.
I have the freedom to work with AI tools as much as I as I want and kind of lead the team in the direction I see fit.
It’s a lot of fun for exploring ideas. I’ve built things very fast that I would not have done at all otherwise. I have rewritten a huge chunk of semi-outdated docs into something useful with a couple of Prompts in a day. Claude does all the annoying dependency update breaks the build kinds of things. And the reviews are extremely useful and a perfect combination with human review as they catch things extremely well that humans are bad at catching.
But in the production codebase changes must be made with much more consideration. Claude tends to perform well ob some tasks but for other I end up wasting time because I just don’t know up front how the feature must look so I cannot write a spec at the level of precision that claude needs and changing code manually is more efficient for this kind of discovery for me than dealing with large chunks of constantly changing code.
And then there’s the fact that claude produces things that work and do the thing described in the prompt extremely well but they are always also wring in sone way. When I let AI build a large chunk of code and actually go through the code there’s always a mess somewhere that ai review doesn’t see because it looks completely plausible but contains some horrible security issue or complete inconsistency with the rest of the codebase or, you know, that custom yaml parser nobody asked for and that you don’t want your day job to depend on.
Oh good example
Claude recently tried to replace a html sanitizer with a custom regex that perfectly fit all our tests as well as the spec I wrote
Agreed, you often dig into what it built and find something insanely over engineered or something that doesn’t match the “style” of your existing code.
In this case that‘s actually a security vulnerability, I‘ve also seen a case where it built an api with auth but added a route where anyone could just PUT a new API key into it. Sometimes its own code review catches these, sometimes it does not.
my team is anti-AI. my code review requests are ignored, or are treated more strictly than others. it feels coordinated - i will have to push back the launch date of my project as a result.
another teammate added a length check to an input field, and his request was merged near instantly, even though it had zero unit testing. this team is incredibly cooked in the long term, i just need to ensure that i survive the short term somehow.
" this team is incredibly cooked in the long term" they're not actually.
People like you are making sunk expenditures whilst the models are evolving... they can just wait until the models get to 'steady-state' to figure out the optimal workflow. They will have lost out on far less.
> another teammate added a length check to an input field, and his request was merged near instantly, even though it had zero unit testing
That sounds extremely reasonable though?
Code that does not take a pre-existent unit test from failing to passing is by definition broken.
No its not.
that is not what "by definition" means
i take it you’re meaning i’m the “treat every gun as if it’s loaded” sense and not actually
it sounds like you might have wasted your team's time previously and now they don't trust the code you put up with a PR. Maybe you can do something to improve your relationship with them?
As a sidenote, I highly doubt they are cooked longterm. Using AI is not exactly skilled labor. If they want or need I'm sure they could learn patterns/workflows in like an afternoon. As things go on it will only get easier to use.
Exactly. I find it hilarious that the people down-voted my comment.
Like yeah sorry... not everyone has to be a risk-taker. Many people like to observe and await to see what new techniques emerge that can be exploited.
I would start looking for a job at an AI-leaning firm.
I am a developer turned (reluctantly) into management. I still keep my hands in code and work w team on a handful of projects. We use GitHub copilot on a daily basis and it has become a great tool that has improved our speed and quality. I have 20+ years experience and see it as just another tool in the toolbox. Maybe I’m naive but I don’t feel threatened by it.
At least at my company the problem is the business hasn’t caught up. We can code faster but our stakeholders can’t decide what they want us to build faster. Or test faster or grasp new modalities llms make possible.
That’s where I want to go next: not just speeding up and increasing code quality but improving business analytics and reducing the amount of meetings I have to be in to get business problems understood and solved.
Couldn’t read the entire comments but, my experience has been overwhelmingly positive so far. I think what helps me be effective is a combination of factors: I work only in a modern, well-documented and well-architected Java codebase with over 80% test coverage.
I only use Claude Code with Opus 4.6 on High Effort.
I always, ALWAYS treat my “new job” as writing a detailed ticket for whatever it is I need to do.
I give the model access to a DB replica of my prod DB that I create manually.
I do NOT waste time with custom agents, Claude.md files or any of that stuff.
When I put ALL of the above together, the results ARE THE PROMISED LAND: I simply haven’t written a single line of code manually in the last 3 months.
I find this pretty interesting. I am curious though: Did you dislike coding? You sound genuinely excited to not be doing it anymore.
For me I have been a coder since a very young age and I am nearing the end of my career now. I still love writing code to problem solve just as much as the first day I learnt to code. The thought of something taking that task away from me doesn't fill me with glee.
A parallel for me is if I enjoyed puzzle pages and those brought me with joy and satisfaction employing my grey matter to solve, I just wouldn't find it interesting to have an agent complete the forms to me, with me simply guiding the agent to clues.
Replying once again for future reference to make my position clear: I firmly believe that one MUST experience programming on its own first. No LLMs, no crutches. One MUST feel the abstractions melting away and things clicking in the brain first.
The design becoming obvious. Being able to remove that extra if statement after clarifying requirements with a customer face to face.
A design pattern fitting a scenario like a glove, etc, etc.
You need REAL experience that only comes with time and effort. Years or decades, different businesses, different companies, etc.
But once you have crossed that chasm and that rite of passage, using LLMs becomes a true multiplier and my experience quite fun.
Using them blindly or without experience is a very different thing I can imagine.
I like problem solving and building useful things for our customers. Coding for me was always more of a “means to an end” than pure craft on its own. Obviously some standard, good and clean code pops up when you’re working in things to be extended or maintained by others, but, truth be told, ego battling in code reviews gets boring very fast and additionally, no matter how much I like experimenting with things, if I have an hypothesis, I can now validate it in 2 days instead of 1 week, which means I can validate double the hypothesis.
I am extremely excited about that! Coding in itself as the act of manually typing things? Absolutely not
I work at a large company that is contracted to build warehouses that automate the movement of goods with conveyors, retrieval systems, etc.
This is a key candidates to use AI as we have built hundreds of warehouses in the past. We have a standard product that spans over a hundred thousand lines of code to build upon. Still, we rely on copying code from previous projects if features have been implemented before. We have stopped investing in the product to migrate everything to microservices, for some reason, so this code copying is increasingly common as projects keep getting more complex.
Teams to implement warehouses are generally around eight developers. We are given a design spec to implement, which usually spans a few hundred pages.
AI has over doubled the speed at which I can write backend code. We've done the same task so many times before with previous warehouses, that we have a gold mine of patterns that AI can pick up on if we have a folder of previous projects that it can read. I also feel that the code I write is higher quality, though I have to think more about the design as previously I would realize something wouldn't work whilst writing the code. At GWT though, it's hopeless as there's almost no public GWT projects to train an AI on. It's also very helpful in tracing logs and debugging.
We use Cursor. I was able to use $1,300 tokens worth of Claude Opus 4.6 for a cost of $100 to the company. Sadly, Cursor discontinued it's legacy pricing model due to it being unsustainable, so only the non-frontier models are priced low enough to consistently use. I'm not sure what I'm going to do when this new pricing model takes affect tomorrow, I guess I will have to go back to writing code by hand or figure out how to use models like Gemini 3.1. GPT models also write decent code, but they are always so paranoid and strictly follow prompts to their own detriment. Gemini just feels unstable and inconsistent, though it does write higher quality code.
I'm not being paid any more for doubling my output, so it's not the end of the world if I have to go back to writing code by hand.
I am forced to use it. They want us to only have code written by Claude. We are forced to use spec-kit for everything so every PR has hundreds, if not thousands, of lines of markdown comitted to the repo per ticket. I basically only review code now. It changes so fast it is impossible to have a stable mental model of the application. My job is now to goto meetings, go through the motions of reviewing thousands of lines of slop per day while sending thousands of lines of slop to others. Everything I liked about the job has been stolen from me, only things I disliked or was indifferent to are left.
If this is what the industry is now… this will be my last job in it.
Curse everyone involved with creating this nightmare.
Professionally, sending our code off prem is not an option. Frankly I don’t understand why executives are okay with AI companies training LLMs on their IP. Unless they own a significant stake in the AI company I guess.
Personally, it’s been decent for generating tedious boilerplate. Though I’m not sure if reading the docs and just writing things myself would have been faster when it comes time to debug. I’m pretty fast at code editing with vim at this point. I’m also hesitant to feedback any fixes to the AI companies.
I’ve found “better google” to be a much more comfortable if not faster way to use the tools. Give me the information, I’ll build an understanding and see the big picture much better.
Very hit or miss.
Stack: go, python Team size: 8 Experience, mixed.
I'm using a code review agent which sometimes catches a critical big humans miss, so that is very useful.
Using it to get to know a code base is also very useful. A question like 'which functions touch this table' or 'describe the flow of this API endpoint' are usually answered correctly. This is a huge time saver when I need to work on a code base i'm less familiar with.
For coding, agents are fine for simple straightforward tasks, but I find the tools are very myopic: they prefer very local changes (adding new helper functions all over the place, even when such helpers already exist)
For harder problems I find agents get stuck in loops, and coming up with the right prompts and guardrails can be slower than just writing the code.
I also hates how slow and unpredictable the agents can be. At times it feels like gambling. Will the agents actually fix my tests, or fuck up the code base? Who knows, let's check in 5 minutes.
IMO the worst thing is that juniors can now come up with large change sets, that seem good at a glance but then turn out to be fundamentally flawed, and it takes tons of time to review
It is definitely making me more productive.
Tasks where, in the past, I have thought “if I had a utility to do x it would save me y time” and I’d either start and give up or spend much longer than y on it are now super easy, create a directory, claude “create an app to do x” so simple.
I work in HPC and I’ve found it very useful in creating various shell scripts. It really helps if you have linters such as shellcheck.
Other areas of success have been just offloading the typing/prototyping. I know exactly how the code should look like so I rarely run into issues.
I'm working on a startup, mostly writing C++, and I'm using AI more and more. In the last month I have one machine running Codex working on task while I work on a different machine.
I have to think like micro-manager, coming up with discrete (and well-defined) tasks for the AI to do, and I periodically review the code to make it cleaner/more efficient.
But I'm confident that it is saving me time. And my love for programming has not diminished. I'm still driving the architecture and writing code, but now I have a helper who makes progress in parallel.
Honestly, I don't want to go back.
It's going pretty well, though it took at least six months to get there. I'm helped by knowing the domain reasonably well, and working with a principal investigator who knows it well and who uses LLMs with caution. At this stage I use Claude for coding and research that does not involve sensitive matters, and local-only LLMs for coding and research that does. I've gradually developed some regular practices around careful specification, boundaries, testing, and review, and have definitely seen things go south a few times. Used cautiously, though, I can see it accelerating progress in carefully-chosen and -bounded work.
I am having a blast at work. I've been leaning hard into AI (as directed by leadership) while others are falling far far behind. I am building new production features, often solo or with one or two other engineers, at lightning speed, and being recognized across the org for it. This is an incredible opportunity for many engineers that won't last. I'm trying to make the most of it. It will be sad when software is no longer a useful pastimes for humans. I'm thinking another three years and most of us will be unemployed or our jobs will have been completely transformed into something unrecognizable a few short years ago.
I started AI-assisted coding quite a while ago with "query for code to copy and paste" approach which was slow but it dramatically shifts when the LLMs are used as agents that are just AI that have access to certain things like your project's source codes, internet and some other technical docs that refines them. You can simply instruct it to change snippet of codes by mentioning them with their actions inside the chat that feeds the agent, this is done in tools like cursor, antigravity, llmanywhere. an instruction could be limited to CRUD instructions, CRUD standing for Create, Read, Update, and Delete. an update instruction looks like "change the code that does this to do that" or more precise one "change the timeout of the request to ycombinator.com to 10". having a good memory definitely helps here but forgetting isn't the end of the development or necessity to start reading the source codes yourself to know where an instruction should target but you can ask the project's interconnected source codes (i put interconnected because it generates lots of source codes for some cases like test cases that aren't used in production but are part of the project in my experience with cursor for example) goal summary if you've forgotten the big picture of the project because you came back from a break or something. I used AI agent for my last langgraph solo project only which had python and go languages, git and cursor so take my advice with a grain of salt :)
It's fun, but testing has become more of a PITA. When I write code I test and understand each piece. With AI generated code I need to figure out how it works and why it isn't working.
I enjoy Opus on personal projects. I don’t even bother to check the code. Go/JavaScript/Typescript/CSS works very well for me. Swift not so much. I haven’t tried C/C++ yet. Scala was Ok.
Professionally I hardly use the tools for coding, since I’m in an architecture role and mostly write design docs and do reviews. And I write the occasional prototype.
I have started building tools to integrate copilot (Opus) better with $CORP. This way I can ask it questions across confluence and github.
Leveraging Claude for a project feels very addictive to me. I have to make a conscious effort to stop and I end up working on multiple projects at the same time.
Pretty good, we have a huge number of projects, some more modern than others. For the older legacy systems, it's been hugely useful. Not perfect, needs a bit more babysitting, but a lot easier to deal with than doing it solo. For the newer things, they can mostly be done solely by AI, so more time spent just speccing / designing the system than coding. But every week we are working out better and batter ways of working with AI, so it's an evolving process at the moment
One thing I use Claude for is diagramming system architecture stuff in LateX and it’s great, I just describe what I am visualizing and kaboom I get perfect output I can paste into overleaf.
I find it useful. It has been a big solve from a motivation perspective. Getting into bad API docs or getting started on a complex problem, it's easy to have AI go with me describing it. And the other positive is front end design. I've always hated css and it's derivatives and AI makes me now decent.
The negatives are that AI clearly loves to add code, so I do need to coach it into making nice abstractions and keeping it on track.
It’s like working with the dumbest, most arrogant intern you could imagine. It has perfect recall of the docs but no understating of them.
An example from last week:
Me: Do this.
AI: OK.
<Brings me code that looks like it accomplishes the task but after looking at it it’s accomplishing it in a monkey’s paw/spiteful genie kind of way.>
Me: Not quite, you didn’t take this into account. But I made the same mistake while learning so I can pull it back on track.
AI: OK
<It’s worse, and why are all the values hardcoded now?>
…
40 minutes go by. The simplest, smallest bit of code is almost right.
Me: Alright, abstract it into a Sass mixin.
AI: OK.
<Has no idea how to do it. It installed Sass, but with no understanding of what it’s working on so the mixin implementation looks almost random. Why is that the argument? What is it even trying to accomplish here?>
At which point I just give up and hand code the thing in 10 minutes.
It would be neat if AI worked. It doesn’t.
I'm loving the experience and i also realize that this part of me, the one which could write code is obsolete. Completely. Utterly obsolete. OK, so first i need to admit that i am not the best programmer, but i've been at it for 27 years.
These past months I've been working with two agents developing two things practically in parallel. And i've experienced the fastest, most motivating development sessions i ever had. Together with these two agents i was able to build two very complex systems that use all sorts of data gathering, then ETL into a format that can be queried and maintained, and it all ends up in some awesome web UIs. I used Them not only to write code, but to do the design and architecture, discussed the front end, the business reqs.
And what i can say, is that it felt like a conversation with a crazy fast person who did everything i needed in seconds. AS a tech guy i know what i want and i know how to describe it. That helped A LOT! I know when we lost context and yes, there were stupid consequences that we had to fix. But my impression is that many of these things i see criticized here refer to the people using it, less than to the AI and its output. From my point of view, the output is what i wanted, only 250x faster than i ever expected. And for the critiques targeting the AIs, after this i am sure that they will learn to fill in all those gaps. We will not be criticizing then. By then my only possible job will be to translate somebody's business reqs for an agent to implement as i speak.
I’m transitioning from AI assisted (human in the loop) to AI driven (human on the loop) development. But my problems are pretty niche, I’m doing analytics right now where AI-driven is much more accessible. I’m in a team of three but so far I’m the only one doing the AI driven stuff. It basically means focusing on your specification since you are handing development off to the AI afterwards (and then a review of functionality/test coverage before deploying).
Mostly using Gemini Flash 3 at a FAANG.
I've shipped full features and bug fixes without touching an IDE for anything significant.
When I need to type stuff myself it's mostly just minor flavour changes like Claude adding docstrings in a silly way or naming test functions the wrong way - stuff that I fixed in the prompt for the next time.
And yes, I read and understand the code produced before I tag anyone to review the PR. I'm not a monster =)
It allowed me to build my SaaS https://agreezy.app in 2 months (started January and launched early February). A lot of back and forth between Claude and Qwen but it's pretty polished. AI hallucinations are real so I ended up more tests than normal.
I am getting disproportionately good results with the models by following a process: spec -> plan -> critique -> improve plan -> implement plan.
If I may "yes, and" this: spec → plan → critique → improve plan → implement plan → code review
It may sound absurd to review an implementation with the same model you used to write it, but it works extremely well. You can optionally crank the "effort" knob (if your model has one) to "max" for the code review.
A blanket follow-up "are you sure this is the best way to do it?"
Frequently returns, "Oh, you are absolutely correct, let me redo this part better."
You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.
At the end of the day it’s an autocomplete. So if you ask “are you sure?” then “oh, actually” is a statistically likely completion.
> You should start a new session for the code review to make sure the context window is not polluted with the work on implementation itself.
I'm just a sample size of one, but FWIW I didn't find that this noticably improved my results.
Not having to completely recreate all the LLM context neccessary to understand the literal context and the spectrum of possible solutions (which the LLM still "knows" before you clear the session) saves lots of time and tokens.
Interesting, I definitely see better results on a clean session. On a “dirty” session it’s more likely to go with “this is what we implemented, it’s good, we could improve it this way”, whereas on a clean session it’s a lot more likely to find actual issues or things that were overlooked in the implementation session.
Can you give a little more detail how you execute these steps? Is there a specific tool you use, or is it simply different kinds of prompts?
I wrote it down here: https://x.com/BraaiEngineer/status/2016887552163119225
However, I have since condensed this into 2 prompts:
1. Write plan in Plan Mode
2. (Exit Plan Mode) Critique -> Improve loop -> Implement.
I follow a very similar workflow, with manual human review of plans and continuous feedback loops with the plan iterations
See me in action here. It's a quick demo: https://youtu.be/a_AT7cEN_9I
similar approach
> Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless."
This comment section is exactly the same, of course.
> I'd like to cut through the noise
Me too, but it's not happening here.
Originally my workflow was:
- Think about requirement
- Spend 0-360 minutes looking through the code
- Start writing code
- Realize I didn't think about it quite enough and fix the design
- Finish writing code
- Write unit tests
- Submit MR
- Fix MR feedback
Until recently no LLM was able to properly disrupt that, however the release of Opus 4.5 changed that.
Now my workflow is:
- Throw as much context into Opus as possible about what I want in plan mode
- Spend 0-60 minutes refining the plan
- Have Opus do the implementation
- Review all the code and nitpick small things
- Submit MR
- Implement MR feedback
My workflow is something very similar. I'd say one difference now is PRs actually take longer to get merged, but it's mainly because we ignore them and move onto something else while waiting for CI and reviews. It's not uncommon for a team member to have multiple PRs open for completely different features.
Context switching is less painful when you have a plan doc and chat history where you can ask why yesterday afternoon you (the human) decided to do this thing that way. Also for debugging it's very useful to be able to jump back in if any issues come up on QA/prod later. And I've actually had a few shower thoughts like that, which have allowed the implementations of some features to end up being much better than how I first envisioned it.
Odd how you add the time for the requirement analysis but none for the coding.
Then you tell us you leave 83% of the analysis —and the coding— to a code chatbot.
Are you actually more productive or are you going to find out down the line the chatbot missed some requirements and made APIs up to fill up a downstream document and now you better support them by yesterday?
In ye olden days, people doing this would scream at the junior developers. Are you going to scream at your screen?
To be honest, I didn't think too hard about it. I just fired and submitted with the time estimates in there kind of randomly.
You are clearly a naysayer. I get it. I was one too for a long time. Then I found a workflow and a model that was clearly delivering results and that's what I use now.
It's only a matter of time before it happens to everyone, even you. Once you have the aha moment where it works for you, you'll stop asking everyone whether they really know if it's better.
The LLM-based workflow above produces good code at a speed at least as fast as my previous workflow and typically many, many, many times faster with the code produced often using designs I would have never thought of before being able to bounce ideas off an LLM first. The biggest difference, besides the time obviously, is that the energy I need to spend is in very different places between the two.
Before it was thinking about what I needed to do and writing the code.
Now it's thinking about what I need to do and reviewing the code.
Well, I'm not considering using any code generation outside of helper scripts because in my case coding is a negligible part of my work. If I didn't have the LLM, I would find and modify the tool it is lifting code from using pre-LLM Google.
I know asking one of these LLMs to produce a document from my notes, resulted in me having to review this professional and plausible-looking yet subtly wrong document for more hours than it would have taken me to produce the document from scratch.
It's been great - I work on a lot of projects that are essentially prototypes, to test out different ideas. It's amazing for this - I can create web apps in a day now, which in the past I would not have been able to create at all, as I spent most of my career on the backend.
We use our own scripts around claude code to create and maintain 100s of products. We have products going back 30+ years and clients are definitely happier since AI. We are more responsive to requests for lower fees than before.
Going back and forth with an AI all day psychologically draining, as is checking its output with a fine tooth comb.
Einstein said something like: "To punish my distain for authority, God made me an authority". I feel like to push my distain for dev managers, techbro Jesus has made me a dev manager, of AI agents.
The output the agent creates falls into one of these categories:
1. Correct, maintainable changes 2. Correct, not maintable changes 3. Correct diff, maintains expected system interaction 4. Correct diff, breaks system interaction.
In no way are they consistent or deterministic but _always_ convincing they are correct.
I also work at big tech. Claude code is very good and I have not written code by hand in months. These are very messy codebases as well.
I have to very much be in the loop and constantly guiding it with clarifying questions but it has made running multiple projects in parallel much easier and has handled many tedious tasks.
I almost don't write any code by hand anymore and was able to take up a second part-time job thanks to genai.
Cursor with GPT-5.4 puts me to shame, honestly.
Exceptionally well. I’ve been using it for my side project for the last 7 months and have learned how to use it really well on a rather large codebase now. My side project has about 100k LOC and most of it is AI generated, though I do heavily review and edit.
AI-assisted research is a solid A already. If you are doing greenfield then. The horizon is only blocked by the GUI required tooling. Even then, that is a small enough obstruction for most researchers.
I'm a manager at a large consumer website. My team and I have built a harness that uses headless Claude's (running Opus) to do ticket work, respond to and fix PR comments, and fix CI test failures. Our only interaction with code is writing specs in Jira tickets (which we primarily do via local Claudes) and adding PR comments to GitHub PRs.
The speed we can move at is astounding. We're going to finish our backlog next quarter. We're conservatively planning on launching 3x as many features next quarter.
Claude is far from perfect: it's made us reassess our coding standards since code is primarily for Claude now, not for humans. So much of what we did was to make code easier for the next dev, and that just doesn't matter anymore.
> and adding PR comments to GitHub PRs...
> So much of what we did was to make code easier for the next dev, and that just doesn't matter anymore
Humans don't need to read the code when leaving PR comments?
When is your website going to be complete? Are you sure those features are what the users need? What happens to the team after everything is done? What happens to the site after the team is gone?
In the Lord's name we pray: Infinite. Growth. Forever.
Amen.
What you said about "we're all cooked" and "AI is useless" is literally me and everyone I know switching between the two on an hourly basis...
I find it the most exciting time for me as a builder, I can just get more things done.
Professionally, I'm dreading for our future, but I'm sure it will be better than I fear, worse than I hope.
From a toolset, I use the usual, Cursor (Super expensive if you go with Opus 4.6 max, but their computer use is game changing, although soon will become a commodity), Claude code (pro max plan) - is my new favorite. Trying out codex, and even copilot as it's practically free if you have enterprise GitHub. I'm going to probably move to Claude Code, I'm paying way too much for Cursor, and I don't really need tab completion anymore... once Claude Code will have a decent computer use environment, I'll probably cancel my Cursor account. Or... I'll just use my own with OpenClaw, but I'm not going to give it any work / personal access, only access to stuff that is publicly available (e.g. run sanity as a regular user). Playing with skills, subagents, agent teams, etc... it's all just markdowns and json files all the way down...
About our professional future:
I'm not going to start learning to be a plumber / electrician / A/C repair etc, and I am not going to recommend my children to do so either, but I am not sure I will push them to learn Computer Science, unless they really want to do Computer Science.
What excites me the most right now is my experiments with OpenClaw / NanoClaw, I'm just having a blast.
tl;dr most exciting yet terrifying times of my life.
I've gone back and forth on it a lot myself, but lately I've been more optimistic, for a couple of reasons.
While the final impact LLMs will have is yet to be determined (the hype cycle has to calm down, we need time to see impacts in production software, and there is inevitably going to be some kind of collapse in the market at some point), its undoubtable that it will improve overall productivity (though I think it's going to be far more nuanced then most people think). But with that productivity improvement will come a substantial increase in complexity and demand for work. We see this playout every single time some tool comes along and makes engineers in any field more productive. Those changes will also take time, but I suspect we're going to see a larger number of smaller teams working on more projects.
And ultimately, this change is coming for basically all industries. The only industries that might remain totally unaffected are ones that rely entirely on manual labor, but even then the actual business side of the business will also be impacted. At the end of the day I think it's better to be in a position to understand and (even to a small degree) influence the way things are going, instead of just being along for the ride.
If the only value someone brings is the ability to take a spec from someone else and churn out a module/component/class/whatever, they should be very very worried right now. But that doesn't describe a single software engineer I know.
My current employer is taking a long time to figure out how they think they want people to use it, meanwhile, all my side projects for personal use are going quite strong.
I work for a university. We've got a dedicated ChatGPT instance but haven't been approved to use a harness yet. Some devs did a pilot project and approval/licenses are supposedly coming soon.
It's useful. At my company we have an internal LLM that tends to be used in lieu of searching the web, to avoid unintentionally leaking information about what we are working on to third parties. This includes questions about software development, including generating of code. For various reasons we are not permitted to copy this verbatim, but can use it for guidance - much like, say, inspiration from Stack Overflow answers.
It has reignited my passion for coding by making it so I don't have to use my coding muscle as much during the day to improve our technologically boring product.
I still like using it for quick references and autocomplete, boilerplate function. It's funny that text completion with tab is now seen as totally obsolete to some folks.
5 years ago, I set out to build an open-source, interoperable marketplace. It was very ambitious because it required me to build an open source Shopify for not only e-commerce, but restaurants, gyms, hotels, etc that this marketplace could tap into. Without AI, this vision would’ve faltered but with AI, this vision is in reach. I see so many takes about AI killing open-source but I truly think it will liberate us from proprietary SaaS and marketplaces given enough time.
> If you've recently used AI tools for professional coding work, tell us about it.
POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is easy for me to fix the code than asking the model to fix it for me.
The work: Fairly straightward UI + backend work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages.
POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a few back & forths with the model, some direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further.
Challenges: 1) AI first documentation: Stories have to be written with greater detail and acceptance criteria. 2) Code reviews: copilot reviews on Github are surprisingly insightful, but waiting on human reviews is still a bottleneck. 3) AI first thinking: Some of the lead devs are still hung up on certain best practices that are not relevant in a world where the machine generates most of the code. There is a friction in the code LLM is good at and the standards expected from an experienced developer. This creates busy work at best, frustration at worst. 4) Anti-AI sentiment: There is a vocal minority who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a bit political and slack channels are getting interesting. 5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, some members struggle more than others. 6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term impact on producing developers who can code for a living.
Personally, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.
Very powerful tool. In the right hands.
I work on an ancient codebase, C# and C++ code spanning over 3 major repos and 5 other minor ones. I'm senior engineer and tech lead of my team, but I also do a lot of actual coding and code reviews. It's a somewhat critical internal infra. I'm intimately familiar with most of the code.
I've become somewhat addicted to using coding agents, in the sense I've felt I can finally realize a lot of fantasies about code cleanup and modernization I've had during the decade, and also fulfill user requests, without spending a lot of time writing code and debugging. During the last few months I've been spending my weekends prompting and learning the ropes. I've been using GPT 5.x and GPT 4 before that.
I've tried both giving it big cleanup tasks, and big design tasks. It was ok but mentally very exhausting, especially as it tends to stick to my original prompt which included a lot of known unknowns, even after I told it I've settled on a design decision, and then I have to go over its generated code line-by-line and verify that earlier decisions I had already rejected aren't slipping into the code again. In some instances I've had to tell it again and again that the code it's working on is greenfield and no backwards compatibility should be kept. In other instances I had to tell it that it shouldn't touch public API.
Also, a lot of things which I take for granted aren't done, such as writing detailed comments above each piece of code that is due to a design constraint or an obscure legacy reason. Even though I explicitly prompt it to do so.
Hand-holding it is a chore. It's like coaching a junior dev. This is on top of me having 4 actual real-life junior devs sending me PRs to review each week. It's mentally exhausting. At least I know it won't take offense when I'm belittling its overly complicated code and bad design decision (which I NEVER do when reviewing PRs for the actual junior devs, so in this sense I get something to throw my aggression against).
I have tried using it to make 3 big tasks in the last 5 months. I have shelved the first one (modernizing an ancient codebase written more than 20 years ago), as it still doesn't work even after spending ~week on it, and I can't spare any more time. The second one (getting another huge C# codebase to stop rebuilding the world on every compilation) seemed promising and in fact did work, but I ended up shelving it after discovering its solution broke auto-complete in Visual Studio. A MS bug, but still.
The 3rd big task is actually a user-facing one, involving a new file format, a managed reader and a backend writer. I gave it a more-or-less detailed design document. It went pretty ok, especially after I've made the jump to GPT 5.2 and now 5.4. Both of them still tended to hallunicate too much when the code size passed a certain threshold.
I don't use it for bug fixing or small features, since it requires a lot of explaining, and not worth it. Our system has a ton of legacy requirement and backwards compatibility guarantees that would take many days to specify properly.
I've become disillusioned last week. It's all for the best. Now that my addiction has lessened maybe I can have my weekends back.
Running faster does not mean anything unless you know where you are going.
From this thread, so far it seems:
Net negative for the ones who care and still need to work closely with others
Net positive for the ones who don't and/or are lone wolves
Maybe the future is lone wolves working on their thing without a care in the world. Accountable to no one but themselves. Bus factor dialed up to 11.
I'd say 90% of our code is AI written today. Everyone in engineering (~30 people) is going full send on AI. We are still hand reviewing PRs and have quite strict standards, so this works to keep most of the AI slop out of the codebase. Early on most of that was humans being lazy and not reviewing their own PRs before opening them, but we've got better at that now. We still have a lot of legacy human slop ("tech debt") we are trying to get rid of, and the efficiency gains from AI let us spend time on that.
Stack is a monolith SaaS dashboard in Vue / Typescript on the frontend, Node.js on the backend, first built in 2019, with something like 5 different frontend state management technologies. Everyone is senior level.
We use Cursor and Opus 4.6 mainly, and are trying to figure out a more agentic process so we can work on multiple tasks in parallel. Right now we are still mainly prompting.
It’s not really that useful for what people tell me it will be useful for, most of the time. For context, I am a senior engineer that works in fintech, mostly doing backend work on APIs and payment rail microservices.
I find the most use from it as a search engine the same way I’d google “x problem stackoverflow”.
When I was first tasked with evaluating it for programming assistance, I thought it was a good “rubber duck” - but my opinion has since changed. I found that if I documented my goals and steps, using it as a rubber duck tended to lead me away from my goals rather than refine them.
Outside of my role they can be a bit more useful and generally impressive when it comes to prompting small proof of concept applications or tools.
My general take on the current state of LLMs for programming in my role is that they are like having a junior engineer that does not learn and has a severe memory disorder.
Here's my anecdote: I use ChatGPT, Gemini (web chat UI) and Claude. Claude is a bit more convenient in that it has access to my code bases, but this comes at the cost of I have to be careful I'm steering it correctly, while with the chat bots, I can feed it only the correct context.
They simplify discrete tasks. Feature additions, bug fixes, augmenting functionality.
They are incapable of creating good quality (easily expandable etc) architecture or overall design, but that's OK. I write the structs, module layout etc, and let it work on one thing at a time. In the past few days, I've had it:
Overall, great tool! But I think a lot of people are lying about its capabilities.I've never been more productive. If only I had a job...
Somewhat against the common sentiment, I find it's very helpful on a large legacy project. At work, our main product is a very old, very large code base. This means it's difficult to build up a good understanding of it -- documentation is often out of date, or makes assumptions about prior knowledge. Tracking down the team or teams that can help requires being very skilled at navigating a large corporate hierarchy. But at the end of the day, the answers for how the code works is mostly in the code itself, and this is where AI assistance has really been shining for me. It can explore the code base and find and explain patterns and available methods far faster than I can.
My prompts end to be in the pattern of "I am looking to implement <X>. <Detailed description of what I expect X to do.>. Review the code base to find similar examples of how this is currently done, and propose a plan for how to implement this."
These days I'm on Claude Code, and I do that first part in Plan mode, though even a few months ago on earlier, not-as-performant models and tools, I was still finding value with this approach. It's just getting better, as the company is investing in shared skills/tools/plugins/whatever the current terminology is that is specific to various use cases within the code base.
I haven't been writing so much code directly, but I do still very much feel that this is my code. My sessions are very interactive -- I ask the agent to explain decisions, question its plans, review the produced code and often revise it. I find it frees me up to spend more time thinking through and having higher level architecture applied instead of spending frustrating hours hunting down more basic "how does this work" information.
I think it might have been an article by Simon Willison that made the case for there being a way to use AI tooling to make you smarter, or to make you dumber. Point and shoot and blindly accept output makes you dumber -- it places more distance between you and your code base. Using AI tools to automate away a lot of the toil give you energy and time to dive deeper into your code base and develop a stronger mental model of how it works -- it makes you smarter. I keep in mind that at the end of the day, it's my name on the PR, regardless of how much Claude directly created or edited the files.
Measured 12x increase in issues fixed/implemented. Solo founder business, so these are real numbers (over 2 months), not corporate fakery. And no, I am not interested in convincing you, I hope all my competitors believe that AI is junk :-)
I use it all the time now, switching between claude code, codex, and cursor. I prefer CC and codex for now but everyone is copying everyone else's homework.
I do a lot of green field research adjacent work, or work directly with messy code from our researchers. It's been excellent at building small tools from scratch, and for essentially brute forcing undocumented code. I can give it a prompt like "Here is this code we got from research, the docs are 3 months out of date and don't work, keep trying things until you manage to get $THING running".
Even for more production and engineering related tasks I'm finding it speeds up velocity. But my engineering is still closer to greenfield than a lot of people here.
I do however feel less connected to the code, even when reviewing thoroughly, I feel like I internalize things at a high level, rather than knowing every implementation detail off the dome.
The other downside is I get bigger and more frequent code review requests from colleagues. No on is just handing me straight up slop (yet...)
Using Claude Code professionally for the last 2 months (Max plan) at Rhoda AI and love it!
Software Engineering has never been more enjoyable.
Python, C++, Docker, ML infra, frontend, robotics software
I have 5 concurrent Claude Code sessions on the same mono repo.
Thank you Anthropic!
You guys are definitely missing out. I have the perfect army of mid-level engineers. Using codex lately, my own CPU and ram are the ones holding me back from spinning more and more agents
On my side I have used Claude code, tbh for solo projects it's good enough if you already know what you need to do.
Answering your questions:
On my job we've been spoon fed to use GH copilot everywhere we can. It's been configured to review PRs, make corrections etc. - I'd say it's good enough but from time to time it will raise false positives on issues. I'd say it works fine but you still need to keep an eye on generated BS.
I've seen coworkers show me amazing stuff done with agentic coding and I've seen coworkers open up slop PRs with bunch of garbage generated code which is kind of annoying, but I'll let it slide...
Stack - .NET, Angular, SQL Server and ofc hosted in Azure.
Team is composed of about 100 engineers (devs, QA, devops etc.) and from what I can see there are no Juniors, which is sad to see if you ask me
FAANG colleague writes this week -- "I am currently being eaten alive by AI stuff for my non-(foss-project) work. I spend most of my day slogging through AI generated comments and code trying to figure out what is good, not good, or needs my help to become good. Or I'm trying to figure out how to prompt the tools to do what I want them to do"
This fellow is one of the few mature software engineers I have ever met who is rigorously and consistently productive in a very challenging mature code base year in and year out. or WAS .. yes this is from coughgooglecough in California
I must admit I'm totally lost on what this is trying to communicate.
What is "non-(foss-project) work".
Is this person saying AI is bad, because it's generating so much, or good, because they're using it?
I haven't actively looked into it, but on a couple of occasions after google began inserting gemini results at the top of the list, I decided to try using some of the generated code samples when then search didn't turn up anything useful. The results were a mixed bag- the libraries that I'd been searching for examples from were not very broadly used and their interfaces volatile enough that in some cases the model was returning results for obsolete versions. Not a huge deal since the canonical docs had some recommendations. In at least a couple of cases though, the results included references to functions that had never been in the library at all, even though they sounded not only plausible but would have been useful if they did in fact exist.
In the end, I am generally using the search engine to find examples because I am too lazy to look at the source for the library I'm using, but if the choice is between an LLM that fabricates stuff some percentage of the time and just reading the fucking code like I've been doing for decades, I'd rather just take my chances with the search engine. If I'm unable to understand the code I'm reading enough to make it work, it's a good signal that maybe I shouldn't be using it at all since ultimately I'm going to be on the hook to straighten things out if stuff goes sideways.
Ultimately that's what this is all about- writing code is a big part of my career but the thing that has kept me employed is being able to figure out what to do when some code that I assembled (through some combination of experimentation, documentation, or duplication) is not behaving the way I had hoped. If I don't understand my own code chances are I'll have zero intuition about why it's not working correctly, and so the idea of introducing a bunch of random shit thrown together by some service which may or may not be able to explain it to me would be a disservice to my employers who trust me on the basis of my history of being careful.
I also just enjoy figuring shit out on my own.
I feel bad saying this because so many folks have not had the best of luck, but it's changed the game for me.
I'm building out large multi-repo features in a 60 repo microservice system for my day job. The AI is very good at exploring all the repos and creating plans that cut across them to build the new feature or service. I've built out legacy features and also completely new web systems, and also done refactoring. Most things I make involve 6-8 repos. Everything goes through code review and QA. Code being created is not slop. High quality code and passes reviews as such. Any pushback I get goes back in to the docs and next time round those mistakes aren't made.
I did a demo of how I work in AI to the dev team at Math Academy who were complete skeptics before the call 2 hours later they were converts.
I'm repeating this 3rd time, but, a non-technical client of mine has whipped up an impressive SaaS prototype with tons of features. They still need help with the cleanup, it's all slop, but I was doing many small coding requests for that client. Those gigs will simply disappear.
I just got started using Claude very recently. I have not been in the loop how much better it got. Now it's obvious that no one will write code by hand. I genuinely fear for my ability to make a living as soon as 2 years from now, if not sooner. I figure the only way is to enter the red queen race and ship some good products. This is the positive I see. If I put 30h/week into something, I have productivity of 3 people. If it's a weekend project at 10h/week, I have now what used to be that full week of productivity. The economics of developing products solo have vastly changed for the better.
I measured my output:
- 1.5x more commits. - 2x more issues closed.
The commits are real. I'm not doing "vibe coding" or even agentic coding. I'm doing turn-by-turn where I micromanage the LLM, give specific implementation instructions, and then read and run the output before committing the code.
I'm more than happy with 2x issues closed. For my client work it means my wildly optimistic programmer estimates are almost accurate now.
I did have a frustrating period where a client was generating specs using ChatGPT. I was simply honest: "I have no idea what this nonsense means, let's meet to discuss the new requirements." That worked.
It's a game changer for reading large codebases and debugging.
Error messages were the "slop" of the pre-LLM era. This is where an LLM shines, filling in the gaps where software engineering was neglected.
As for writing code, I don't let it generate anything that I couldn't have written myself, or anything that I can't keep in my brain at once. Otherwise I get really nervous about committing.
The job of a software engineer does and always has relied upon taking responsibility for the quality of one's work. Whether it's auto-complete or a fancier auto-complete, the responsibility should rest on your shoulders.
Context: I work in robotics. We use mostly c++ and python. The entire team is about 200 though the subset I regularly interact with is maybe 50.
I basically don't use AI for coding at all. When I have tried it, it's just half working garbage and trying to describe what I want in natural language is just miserable. It feels like trying to communicate via smoke signals.
I'll be a classical engineer until they fire me and then go do something else. So far, that's working. We've had multiple rounds of large layoffs in the last year and somehow I'm still here.
I work at a unicorn in EU. Claude Code has been rolled out to all of engineering with strict cost control policies, even with these in place we burn through tens of thousands of euro per months that I think could translate in 15/20 hires easily. Are we more productive than adding people to the headcount? That's a good question that I cannot answer.
Some senior people that were in the AI pilot, have been using this for a while, and are very into it claimed that it can open PRs autonomously with minimum input or supervision (with a ton of MD files and skills in repos with clear architecture standards). I couldn't replicate this yet.
I'm objectively happy to have access to this tool, it feels like a cheat code sometimes. I can research things in the codebase so fast, or update tests and glue code so quickly that my life is objectively better. If the change is small or a simple bugfix it can truly do it autonomously quicker than me. It does make me lazier though, sometimes it's just easier to fire up claude than to focus and do it by myself.
I'm careful to not overuse it mostly to not reach the montlhy cap, so that I can "keep it" if something urgent or complex comes my way. Also I still like to do things by hand just because I still want to learn and maintain my skills. I feel that I'm not learning anything by using claude, that's a real thing.
In the end I feel it's a powerful tool that is here to stay and I would be upset if I wouldn't have access to it anymore, it's very good. I recently subscribed to it and use it on my free time just because it's a very fun technology to play with. But it's a tool. I'm paid because I take responsability that my work will be delivered on time, working, tested, with code on par with the org quality standards. If I do it by hand or with claude is irrelevant. If i can do it faster it will likely mean I will receive more work to do. Somebody still has to operate Claude and it's not going to be non-technical people for sure.
I genuinely think that if anyone still believes today that this technology is only hype or a slop machine, they are in denial or haven't tried to use a recent frontier model with the correct setup (mostly giving the agent a way to autonomously validate it's changes).
Why did your company get claude code with token billing instead of getting everyone max plans ?
I think they need to have the enterprise plan for accessing advanced security and data handling guarantees. Also they set up pretty strict controls on what tools the agents can use at the org level that we cannot override, not sure that's an option with the subscription plans.
ZDR in place at API level but need enterprise contract if on a plan. Vendor lock-in and IP drivers.
Not that guy, but here token billing was chosen to get the Enterprise monitoring shit. I think the C-suite is expected to report productivity increases and needs all of the data that Anthropic can scrape to justify how much money is being on fire right now.
AI is provided to my project, but to my knowledge nobody is using it. I have trouble seeing what advantages AI would provide for the work we do.
I have been doing this work long enough to know how to increase human productivity. It’s not bullshit like frameworks or AI. The secret is smaller code and faster executing applications and the kind of people who prefer simple versus easy.
I’d be more curious to hear about the processes people have put in place for AI code reviews
On the one hand, past some threshold of criticality/complexity, you can’t push AI unreviewed, on the other, you can’t relegate your senior best engineers to do nothing but review code
It doesn’t just not scale, it makes their lives miserable
So then, what’s the best approach?
I think over time that threshold I mentioned will get higher and higher, but at the moment the ratio of code that needs to be reviewed to reviewers is a little high
It is a very mixed bag. I have enjoyed using opus 4.5 and 4.6 to add functionality to existing medium complexity codebases. It’s great for green field scripts and small POCs. I absolutely cannot stand reviewing the mostly insane PRs that other people generate with it.
Very much mixed. I've used Claude to generate small changes to an existing repo, asking it to create functions or react template in the style of the rest of the file its working in, and thats worked great. I still do a lot of the fine tuning there, but if the codebase isn't one I am overly familiar with this is a good way to ensure my work doesn't conflict with the core team's.
I have also done the agentic thing and built a full CLI tool via back-and-forth engagement with Claude and that worked great - I didn't write a single line of code. Because the CLI tool was calling an API, I could ask Claude to run the requests it was generating and adjust based on the result - errors, bad requests etc, and it would fairly rapidly fix and coalesce on a working solution.
After I was done though, I reckon that if instead of this I had just done the work myself I would have had a much smaller, more reliable project. Less error handling, no unit tests, no documentation sure, but it would have worked and worked better - I wouldn't need to iterate off the API responses because I would have started with a better contract-based approach. But all of that would have been hard, would have required more 'slow thinking'. So... I didn't really draw a clean conclusion from the effort.
Continuing to experiment, not giving up on anything yet.
For those of you for who it is working: show your code, please.
I'll bite. Here's a 99.9% vibe-coded raw Git repository reader suitable for self-hosted or shared host environments:
https://repo.autonoma.ca/treetrek
There's still some work to do on the rendering side of model objects. Developing the syntax highlighting rules for 40 languages and file formats in about 10 minutes was amazing to see.
https://repo.autonoma.ca/repo/treetrek/tree/HEAD/render/rule...
Cool, thank you.
Edit, great example. What is your long term maintenance strategy, do you keep the original prompts around so you can refine them later or do you dig into the source?
Would love to see more of your workflow.
Here's one success I had -
https://github.com/sroerick/pakkun
It's git for ETL. I haven't looked at the code, but I've been using it pretty effectively for the last week or two. I wouldn't feel comfortable recommending it to anybody else, but it was basically one-shotted. I've been dogfooding it on a number of projects, had the LLM iterate on it a bit, and I'm generally very happy with the ergonomics.
That's a nice example, can you explain your 'one shot' setup in some more detail?
I don't have the prompt, but I used codex. I probably wrote a medium sized paragraph explaining the architecture. It scaffolded out the app, and I think I prompted it twice more with some very small bugfixes. That got me to an MVP which I used to build LaTeX pipelines. Since then, I've added a few features out as I've dogfooded it.
It's a bit challenging / frustrating to get LLMs to build out a framework/library and the app that you're using the framework in at the same time. If it hits a bug in the framework, sometimes it will rewrite the app to match the bug rather than fixing the bug. It's kind of a context balancing act, and you have to have a pretty good idea of how you're looking to improve things as you dogfood. It can be done, it takes some juggling, though.
I think LLMs are good at golang, and also good at that "lightweight utility function" class of software. If you keep things skeletal, I think you can avoid a lot of the slop feeling when you get stuck in a "MOVE THE BUTTON LEFT" loop.
I also think that dogfooding is another big key. I coded up a calculator app for a dentist office which 2-3 people use about 25 times a day. Not a lot of moving parts, it's literally just a calculator. It could basically be an excel spreadsheet, except it's a lot better UX to have an app. It wouldn't have been software I'd have written myself, really, but in about 3 total hours of vibecoding, I've had two revisions.
If you can get something to a minimal functional state without a lot of effort, and you can keep your dev/release loop extremely tight, and you use it every day, then over time you can iterate into something that's useful and good.
Overall, I'm definitely faster with LLMs. I don't know if I'm that much faster. I was probably most fluent building web apps in Django, and I was pretty dang fast with that. LLMs are more about things like "How do you build tests to prevent function drift" and "How can I scaffold a feedback loop so that the LLM can debug itself".
I like your pragmatic attitude to all this.
I think your prompts are 'the source' in a traditional sense, and the result of those prompts is almost like 'object code'. It would be great to have a higher level view of computer source code like the one you are sketching but then to distribute the prompt and the AI (toolchain...) to create the code with and the code itself as just one of many representations. This would also solve some of the copyright issues, as well as possibly some of the longer term maintainability challenges because if you need to make changes to the running system in a while then the tool that got you there may no longer be suitable unless there is a way to ingest all of the code it produced previously and then to suggest surgical strikes instead of wholesale updates.
Thank you for taking the time to write this all out, it is most enlightening. It's a fine line between 'nay sayer' and 'fanboi' and I think you've found the right balance.
Thanks for reading it! I didn't use an LLM, lol.
On documentation, I agree with you, and have gone done the same road. I actually built out a little chat app which acts as a wrapper around the codex app which does exactly this. Unfortunately, the UI sucks pretty bad, and I never find myself using it.
I actually asked codex if it could find the chat where I created this in my logs. It turns out, I used the web interface and asked it to make a spec. Here's the link to the chat. Sorry the way I described wasn't really what happened at all! lol. https://chatgpt.com/share/69b77eae-8314-8005-99f0-db0f7d11b7...
As it happens, I actually speak-to-texted my whole prompt. And then gippity glazed me saying "This is a very good idea". And then it wrote a very, very detailed spec. As an aside, I kind of have a conspiracy theory that they deploy "okay" and "very very good" models. And they give you the good model based on if they think it will help sway public opinion. So it wrote a pretty slick piece of software and now here I am promoting the LLM. Oof da!
I didn't really mention - spec first programming is a great thing to do with LLMs. But you can go way too far with it, also. If you let the LLM run wild with the spec it will totally lose track of your project goals. The spec it created here ended up being, I think, a very good spec.
I think "code readability" is really not a solved problem, either pre or post LLM. I'm a big fan of "Code as Data" static analysis tools. I actually think that the ideal situation is less of "here is the prompt history" and something closer to Don Knuth's Literate Programming. I don't actually want to read somebody fighting context drift for an hour. I want polished text which explains in detail both what the code does and why it is structured that way. I don't know how to make the LLMs do literate programming, but now that I think about it, I've never actually tried! Hmmm....
Hers is one: https://github.com/mohsen1/fesh
beats the best compression out there by 6% on average. Yet nobody will care because it was not hand written
That's a very interesting case. If you want I will look into this in more detail, I'm waiting for some parts so I have some time to kill.
Are you an expert in this field? I'm curious if the AI generated code here is actually good.
I've done some work on compression really long ago but I am very far from an expert in the field, in fact I'm not an expert in any field ;) The best I ever did was a way to compress video better than what was available at the time but wavelets overtook that and I have not kept current.
I'm curious about two things:
- is it really that much better (if so, that would by itself be a publishable result) where better is
I think that's a fair challenge.- is it correct?
And as a sidetrack to the latter: can it be understood to the point that you can prove it is correct? Unfortunately I don't have experience with your toolchain but that's a nice learning opportunity.
Question: are you familiar with
https://www.esa.int/Enabling_Support/Space_Engineering_Techn...
https://en.wikipedia.org/wiki/Calgary_corpus
https://corpus.canterbury.ac.nz/
As a black box it works. It produces smaller binaries. when extracted matching bit-by-bit to the original file.
I tested across 100 packages. better efficiency across the board.
But I don't know if I (or anyone) want to maintain software like this. Where it's a complete black box.
it was a fun experiment though. proves that with a robust testing harness you can do interesting things with pure AI coding
Why is this the attitude when it comes to AI? Can you imagine someone saying “please provide your code” when they claim that Rust sped up their work repo or typescript reduced errors in production?
Yes, I can absolutely imagine that.
Eh, sorry, I may have been too quick to judge, but in the past when I have shared examples of AI-generated code to skeptics, the conversation rapidly devolves into personal attacks on my ability as an engineer, etc.
I think the challenge is to not be over-exuberant nor to be overly skeptical. I see AI as just another tool in the toolbox, the fact that lots of people produce crap is no different from before: lots of people produced crappy code well before AI.
But there are definitely exceptions and I think those are underexposed, we don't need 500 ways to solve toy problems we need a low number of ways to solve real ones.
Some of the replies to my comment are exactly that, they show in a much more concrete way than the next pelican-on-a-bicycle what the state of the art is really capable of and how to achieve real world results. Those posts are worth gold compared to some of the junk that gets high visibility, so my idea was to use the opportunity to highlight those instead.
FWIW, I did a full modernization and redesign of a site (~50k loc) over a week with Claude - I was able to ensure quality by (ahead of time) writing a strong e2e test suite which I also drove with AI, then ensuring Claude ran the suite every time it made changes. I got a bunch of really negative comments about it on HN (alluded to in my previous comment - everything from telling me the site looked embarrassing, didn't deserve to be on HN, saying the 600ms load time was too slow, etc, etc, etc), so I mostly withdrew from posting more about it, though I do think that the strategy of a robust e2e suite is a really good idea that can really drive AI productivity.
Yes, that e2e suite is a must for long term support and it probably would be a good idea to always create something like that up front before you even start work on the actual application.
I think that it pays off to revisit the history of the compiler. Initially compilers were positioned as a way for managers to side step the programmers, because the programmers have too much power and are hard to manage.
Writing assembly language by hand is tedious and it requires a certain mindset and the people that did this (at that time programming was still seen as an 'inferior' kind of job) were doing the best they could with very limited tools.
Enter the compiler, now everything would change. Until the mid 1980s many programmers could, when given enough time, take the output of a compiler, scan it for low hanging fruit and produce hybrids where 'inner loops' were taken and hand optimized until they made optimal use of the machine. This gave you 98% of the performance of a completely hand crafted solution, isolated the 'nasty bits' to a small section of the code and was much more manageable over the longer term.
Then, ca. 1995 or so the gap between the best compilers and the best humans started to widen, and the only areas where the humans still held the edge was in the most intricate close-to-the-metal software in for instance computer games and some extremely performant math code (FFTs for instance).
A multitude of different hardware architectures, processor variations and other dimensions made consistently maintaining an edge harder and today all but a handful of people program in high level languages, even on embedded platforms where space and cycles are still at a premium.
Enter LLMs
The whole thing seems to repeat: there are some programmers that are - quite possibly rightly so - holding on to the past. I'm probably guilty of that myself to some extent, I like programming and the idea that some two bit chunk of silicon is going to show me how it is done offends me. At the same time I'm aware of the past and have already gone through the assembly-to-high-level track and I see this as just more of the same.
Another, similar effect was seen around the introduction of the GUI.
Initially the 'low hanging fruit' of programming will fall to any new technology we introduce, boilerplate, CRUD and so on. And over time I would expect these tools to improve to the point where all aspects of computer programming are touched by them and where they either meet or exceed the output of the best of the humans. I believe we are not there yet but the pace is very high and it could easily be that within a short few years we will be in an entirely different relationship with computers than up to today.
Finally, I think we really need to see some kind of frank discussion about compensation of the code ingested by the model providers, there is something very basic that is wrong about taking the work of hundreds of thousands of programmers and then running it through a copyright laundromat at anything other than a 'cost+' model. The valuations of these companies are ridiculous and are a direct reflection of how much code they took from others.
https://github.com/search?q=author%3AClaude&type=commits
Why so every armchair reviewer can yell, "Slop!"?
Guidelines meditation for you.
It solves no actual problem I have. Yet introduces many new ones. Its a trap. So I don't use it and have a strong policy against using it or allowing others to use it on things I work on. BATNA is key.
More work, shorter deadlines, smaller headcount, higher expectations in terms of adaptability/transferability of people between projects (who needs knowledge transfer, ask the AI!).
But in the end, the thing that pisses me off was a manager who used it to write tickets. If the product owner doesn't give a shit about the product enough to think and write about what they want, you'll never be successful as a developer.
Otherwise it's pretty cool stuff.
I use Cursor and have been pretty happy with the Plan -> Revise -> Build -> Correct flow. I don't write everything with it, but the planning step does help me clarify my thoughts at times.
One of the things that has helped the most is all the documentation I wrote inside the repository before I started using AI. It was intended for consumption by other engineers, but I think Cursor has consumed it more than any human. I've even managed to make improvements not by having AI update it, but asking AI "What unanswered questions do you have based on reading the documentation?" It has helped me fill in gaps and add clarity.
Another thing I've gotten a ton of value with is having it author diagrams. I've had it create diagrams with both the mermaid syntax and AWSDAC (Diagram-as-Code). I've always found crafting diagrams a painstaking process. I have it make a first pass by analyzing my code + configuration, then make corrections and adjustments by explaining the changes I want.
In my own PRs, I have been in the habit of posting my Cursor Plan document and Transcript so that others can learn from it. I've also encouraged other team members to do the same.
I feel bad for any teams that are being mandated to use a certain amount of AI. It seems to me that the only way to make it work is by having teams experiment with it and figure out how to best use it given their product and the team's capacity. AI is like a pair of Wile-E-Coyote rocket skates. It'll get you somewhere fast, but unless you've cleared the road of debris and pointed in exactly the right direction, you're going to careen off a cliff or into a wall.
I use it as much as I can, not because it makes me more productive, but because it's better for my career and I don't care about my idea of productivity. Whatever the company wants is probably correct in the short term.
half the answers: yes we move so fast. I haven't seen a text editor in months
other half: i keep fixing everything from other teams using AI otherwise i destroy my carrer.
this thread is very eye opening on how things are going.
I use Claude code for my research projects now, it’s incredible tbh. I’m not writing production code for millions of users I need to do data science stuff and write lots of code to do that and AI lets me focus on the parts of my research that I want to do and it makes me a lot more productive.
Things I’ve learned:
Claude Code is the best CLI tool by a mile.
Even at its best it’s wildly inconsistent from session to session. It does things differently every time. Sometimes I get impressed with how it works, then the next day, doing the exact same thing, and it flips out and goes nuts trying to do the same thing a totally different, unworkable way.
You can capture some of these issues in AGENTS.md files or the like, but there’s an endless future supply of them. And it’s even inconsistent about how it “remembers” things. Sometimes it puts in the project local config, sometimes in my personal overall memory files, sometimes instead of using its internal systems, it asks permission to search my home directory for its memory files.
The best way to use it is for throwaway scripts or examples of how to do something. Or new, small projects where you can get away with never reading the code. For anything larger or more important, its inconsistencies make it a net time loser, imo. Sure, let it write an annoying utility function for you, but don’t just let it loose on your code.
When you do use it for new projects, make it plan out its steps in advance. Provide it with a spec full of explicit usage examples of the functionality you want. It’s very literal, so expect it to overindex on your example cases and treat those as most important. Give it a list of specific libraries or tools you want it to use. Tell it to take your spec and plan out its steps in a separate file. Then tell it to implement those steps. That usually works to allow it to build something medium-complex in an hour or two.
When your context is filling up in a session in a particular project, tell it to review its CLAUDE.md file and make sure it matches the current state of the project. This will help the next session start smoothly.
One of the saddest things I’ve found is when a whole team of colleagues gets obsessed with making Claude figure something out. Once it’s in a bad loop, you need to start over, the context is probably poisoned.
Great! I ask it questions and still type every line of code by hand. LLM's are just a tool and nothing more to me.
I work in the research space so it's mostly prototype code. I have unlimited access to codex 5.x-xhigh. I rarely directly alter the code codex generates at this point. My productivity has significantly increased.
It is great to solve "puzzle" problems and remove road blocks. In the past, whenever I got stuck, I often got frustrated and gave up. In so many cases AI hints to the corret solution and helps me continue. It's like a knowledgeable colleague that you can ask for help when you get stuck.
Another thing is auditing and code polishing. I asked Claude to polish a working, but still rough browser pluging, consisting of two simple Javascript files. It took ten iterations and a full day of highly intensive work to get the quality I wanted. I would say the result is good, but I could not do this process vey often without going insane. And I do not want to do this with a more complex project, yet.
So, yes, I am using it. For me it's a tool, knowledge resource, puzzle solver, code reviewer and source for inspiration. It's not a robot to write my code.
And never trust it blindly!
AI has helped me break through procrastination by taking care of tedious tasks that beforehand had a low ROI (boilerplate, CI/CD configs, test coverage, shell scripting/automation).
- create unit tests and benchmark tests that required lots of boiler plate , fixtures
- add CI / CD to a few projects that I didn't have motivation to
- freshen up old projects to modern standards (testing, CI / CD, update deps, migrations/deprecations)
- add monitoring / alerting to 2 projects that I had been neglecting. One was a custom DNS config uptime monitor.
- automated backup tools along with scripts for verifying recovery procedure.
- moderate migrations for deprecated APIs and refactors within cli and REST API services
- auditing GCP project resources for billing and security breaches
- frontend, backend and offline tiers for cloud storage management app
I wrote a blog post over the New Year's of my experience at a company where AI coding was widely used and developer skill wasn't particularly high.
https://burakku.com/blog/tired-of-ai-coders/
I think the addendum to that is that I've since left.
Love it. I can create closer to the speed of decision making.
AI has definitely shifted my work in a positive way, but I am still playing the same game.
I run a small lab that does large data analytics and web products for a couple large clients. I have 5 developers who I manage directly, I write a lot of code myself and I interact directly with my clients. I have been a web developer for long enough to have written code in coldfusion, php, asp, asp.net, rails, node and javascript through microsoft frontpage exports, to jquery,to backbone, angular and react and in a lot of different frameworks. I feel this breadth of watching the internet develop in stages has given me a decent if imperfect understanding of many of the tradeoffs that can be made in developing for the web.
My work lately is on an analytics / cms / data management / gis platform that is used by a couple of our clients and we've been developing for a couple of years before any ai was used on it all. Its a react front end built on react-router-7 that can be SPA or SSR and a node api server.
I had tried AI coding a couple times over the past few years both for small toy projects and on my work and it felt to me less productive than writing code by hand until this January when I tried Claude Code with Opus 4.5. Since then I have written very few features by hand although I am often actively writing parts of them, or debugging by hand.
I am maybe in a slightly unique place in that part of my job is coming up with tasks for other developers and making sure their code integrates back, I've been doing this for 10 years plus, and personally my sucess rate with getting someone to write a new feature that will get use is maybe a bit over 50%, that is maybe generous? Figuring out what to do next in a project that will create value for users is the hard part of my job whether I am delegating to developers or to an AI and that hasn't changed.
That being said I can move through things significantly faster and more consistently using AI, and get them out to clients for testing to see if they are going to work. Its also been great for tasks which I know my developers will groan if I assign to them. In the last couple months I've been able to
- create a new version of our server that is free from years of cruft of the monorepo api we use across all our projects. - implement sqlite compatablity for the server (in addition to original postgres support) - Implement local first sync from scratch for the project - Test out a large number of optimization strategies, not all of which worked out but which would have taken me so much longer and been so much more onerous the cost benefit ratio of engaging them would have been not worth it - Tons of small features I would have assigned to someone else but are now less effort to just have the AI implement.
In my own use of AI, I keep the bottleneck at my own understanding of the code, its important to me that I maintain a thorough understanding of the codebase. I couple possibly go faster by giving it a longer leash, but that trade off doesn't seem wise to me at this point, first because I'm already moving so much faster than I was very recently and secondly because it doesn't seem very far from the next bottleneck which is deciding what is the next useful thing to implement. For the most part, I find the AI has me moving in the right direction almost all the time but I think this is partly for me because I am already practiced in communicating the programmers what to implement next and I have a deep understanding of the code base, but also because I spend more than half of the time using AI adding context, plans and documentation to the repo.I have encouraged my team to use these tools but I am not forcing it down anyone's throat, although its interesting to give people tasks that I am confident I could finish much quicker and much more to my personal taste than assigning it. The reactions from my team are pretty mixed, one of the strongest contributors doesn't find a lot of gains from it. One has found similar productivity gains to myself, others are very against it and hate it.
I think one of the things it will change for me is, I can no longer just create the stories for everyone, learning how to choose on what to work on is going to be the most important skill in my opinion so over the next couple months I am going to be shifting so everyone on my team has direct client interactions and I am going to try to shift away from writing stories to having meetings where I help them decide on their own what to work on. Still part of the reason that I can afford to do this is because I can now get as much or more work done than I was able to with my whole team at this time last year.
That's a big difference in one way, and I am optimistic that the platform I am working on will be a lot better and able to compete with large legacy platforms that it wouldn't have been able to compete with in the past, but still it just tightens the loop of trying new things and getting feedback and the hardest part of the business is still communication with clients and building relationships that create value.
I've been using opencode and oh-my-opencode with Claude's models (via github Copilot). The last two or three months feel like they have been the most productive of my 28-year career. It's very good indeed with Rails code, I suspect it has something to do with the intentional expressiveness of Ruby plus perhaps some above-average content that it would be trained on for this language and framework. Or maybe that's just my bias.
It takes a bit of hand holding and multiple loops to get things right sometimes, but even with that, it's pretty damn good. I don't usually walk away from it, I actively monitor what it's doing, peek in on the sub-agents, and interject when it goes down a wrong path or writes messy code. But more often than not, it goes like this:
These review loops are essential, they help clean up the code into something coherent most times. It really mirrors how I tend to approach tasks myself: Write something quickly that works, make it robust by adding tests, and then make it maintainable by refactoring. Just way faster.I've been using this approach on a side project, and even though it's only nights an weekends, it's probably the most robust, well-tested and polished solo project I've ever built. All those little nice-to-have and good-to-great things that normally fall by the wayside if you only have nights and weekends - all included now.
And the funny thing is - I feel coding with AI like this gets me in the zone more than hand-coding. I suspect it's the absence of all those pesky rabbit holes that tend to be thrown up by any non-trivial code base and tool chain which can easily distract us from thinking about the problem domain and instead solving problems of our tools. Claude deals with all that almost as a side effect. So while it does its thing, I read through it's self-talk while thinking along about the task at hand, intervening if I disagree, but I stay at the higher level of abstraction, more or less. Only when the task is basically done do I dive a level deeper into code organisation, maintainability, security, edge cases, etc. etc.
Needless to say that very good test coverage is essential to this approach.
Now, I'm very ambiguous about the AI bubble, I believe very firmly that it is one, but for coding specifically, it's a paradigm shift, and I hope it's here to stay.
I'm an ex-FAANG engineer working for a smaller (but still big enough) company.
At work we use one of the less popular solutions, available both as a plugin for vscode and as a claude code-like terminal tool. The code I work on is mostly Golang and there's some older C++ using a lot of custom libraries. For Golang, the AI is doing pretty good, especially on simple tasks like implementing some REST API, so I would estimate the upper boundary of the productivity gain to be maybe 3x for the trivial code.
Since I'm still responsible for the result, I cannot just YOLO and commit the code, so whenever I get to work on simple things, I'm effectively becoming a code reviewer for the majority of time. That is what probably prevents me from going above 3x productivity; after each code review session I still need a break so I go get coffee or something, so it's still much faster than writing all the code manually, but the mental load is also higher which requires more breaks.
One nontrivial consequence is that the expectations are adapting to the new performance, so it's not like we are getting more free time because we are producing the code faster. Not at all.
For the C++ codebase though, in the rare cases when I need to change something there, it's pretty much business as usual; I won't trust the code it generates, and would rather write what I need manually.
Now, for personal projects, it's a completely different story. For the past few months or so, I haven't written any code for my personal projects manually, except for maybe a few trivial changes. I don't review the generated code either, just making sure that it works as I expect. Since I'm probably too lazy to configure the proper multi-agent workflow, what I found works great for me is: first ask Claude for the plan, then copy-paste the plan to Codex, get its feedback back to Claude, repeat until they agree; this process also helps me stay in the loop. Then, when Claude implements the plan and makes a commit, I copy-paste the commit sha to Codex and ask it to review, and it very often finds real issues that I probably would've missed.
It's hard to estimate the productivity gain of this new process mostly because the majority of the projects I worked on these past few months I would've never started without Claude. But for those I would've started, I think I'm somewhere near 4-5x compared to manually writing the code.
One important point here is that, both at work and at home, it's never a "single prompt" result. I think about the high level design and have an understanding of how things will work before I start talking to the agent. I don't think the current state of technology allows developing things in one shot, and I'm not sure this will change soon.
My overall attitude towards AI code generation is quite positive so far: I think, for me, the joy of having something working so soon, and the fact that it follows my design, outweighs the fact that I did not actually write the code.
One very real consequence of that is I'm missing my manual code writing. I started going through the older Advent of code years where I still have some unsolved days, and even solving some Leetcode problems (only interesting ones!) just for the feeling of writing the code as we all did before.
I'm not explicitly authorised to speak about this stuff by my employer but I think it's valuable to share some observations that go beyond "It's good for me" so here's a relatively unfiltered take of what I've seen so far.
Internally, we have a closed beta for what is basically a hosted Claude Code harness. It's ideal for scheduled jobs or async jobs that benefit from large amounts of context.
At a glance, it seems similar to Uber's Minion concept, although we weren't aware of that until recently. I think a lot of people have converged on the same thing.
Having scheduled roundups of things (what did I post in Slack? what did I PR in Github etc) is a nice quality of life improvement. I also have some daily tasks like "Find a subtle cloud spend that would otherwise go unnoticed", "Investigate an unresolved hotfix from one repo and provide the backstory" and "Find a CI pipeline that has been failing 10 times in a row and suggest a fix"
I work in the platform space so your mileage may vary of course. More interesting to me are the second order effects beyond my own experience:
- Hints of engineering-adjacent roles (ie; technical support) who are now empowered to try and generate large PRs implementing unscoped/ill-defined new internal services because they don't have any background to know is "good" or "bad". These sorts of types have always existed as you get people on the edge of technical-adjacent roles who aspire to become fully fledged developers without an internal support mechanism but now the barrier is a little lower.
- PR review fatigue: As a Platform Engineer, I already get tagged on acres of PRs but the velocity of PRs has increased so my inbox is still flooded with merged PRs, not that it was ever a good signal anyway.
- First hints of technical folk who progressed off the tools who might now be encouraged to fix those long standing issues that are simple in their mind but reality has shifted around a lot since. Generally LLMs are pretty good at surfacing this once they check how things are in reality but LLMs don't "know" what your mental model is when you frame a question
- Coworkers defaulting to asking LLMs about niche queries instead of asking others. There are a few queries I've seen where the answer from an LLM is fine but it lacks the historical part that makes many things make sense. As an example off the top of my head, websites often have subdomains not for any good present reason but just because back in the day, you could only have like 6 XHR connections to a domain or whatever it was. LLMs probably aren't going to surface that sort of context which takes a topic from "Was this person just a complexity lover" to "Ah, they were working around the constraints at the time".
- Obviously security is a forever battle. I think we're more security minded than most but the reality is that I don't think any of this can be 100% secure as long as it has internet access in any form, even "read only".
- A temptation to churn out side quests. When I first got started, I would tend to do work after hours but I've definitely trailed off and am back to normal now. Personally I like shipping stuff compared to programming for the sake of it but even then, I think eventually you just normalise and the new "speed" starts to feel slow again
- Privileged users generating and self-merging PRs. We have one project where most everyone has force merge and because it's internal only, we've been doing that paired with automated PR reviews. It works fairly well because we discuss most changes in person before actioning them but there are now a couple historical users who have that same permission contributing from other timezones. Waking up to a changed mental model that hasn't been discussed definitely won't scale and we're going to need to lock this down.
- Signal degradation for PRs: We have a few PRs I've seen where they provide this whole post-hoc rationalisation of what the PR does and what the problem is. You go to the source input and it's someone writing something like "X isn't working? Can you fix it?". It's really hard to infer intent and capability from PR as a result. Often the changes are even quite good but that's not a reflection of the author. To be fair, the alternative might have been that internal user just giving up and never communicating that there was an issue so I can't say this is strictly a negative.
All of the above are all things that are actively discussed internally, even if they're not immediately obvious so I think we're quite healthy in that sense. This stuff is bound to happen regardless, I'm sure most orgs will probably just paper over it or simply have no mechanism to identify it. I can only imagine what fresh hells exist in Silicon Valley where I don't think most people are equipped to be good stewarts or even consider basic ethics.
Overall, I'm not really negative or positive. There is definitely value to be found but I think there will probably be a reckoning where LLMs have temporarily given a hall pass to go faster than the support structures can keep up with. That probably looks like going from starting with a prompt for some work to moving tasks back into ticket trackers, doing pre-work to figure out the scope of the problem etc. Again, entirely different constraints and concerns with Platform BAU than product work.
Actually, I should probably rephase that a little: I'm mostly positive on pure inference while mostly negative on training costs and other societal impacts. I don't believe we'll get to everyone running Gas Town/The Wasteland nor do I think we should aspire to. I like iterating with an agent back and forth locally and I think just heavily automating stuff with no oversight is bound to fail, in the same way that large corporations get bloated and collapse under their own weight.
Previously, both tools and providers:
Right now: Overall thoughts on development:Replying before reading anyone else's responses because I want to provide an honest response. I absolutely love it. I've spent my career as a generalist focusing on architecture and plumbing problems, mostly on Linux and embedded. Coding was something I did to get things done, now I can get things done in new languages incredibly fast, debug annoying problems rapidly, and work on new architectures very rapidly. It does a lot of the annoying research work: interpreting novel build chain failures, tracking down version-related API changes, gathering evidence of popular reports on the large plurality of Apple kernel lockdown changes that perenially break embedded work, etc. I'm working in hardware. Electronics. Physics. Mechanical. Supply chain. Software. EVERYTHING. It's a goddamn superpower. I can't get enough of it. It's like every teacher you ever wanted always available with infinite patience. I've stopped typing a lot of prompts now and started using local voice transcription. It's fantastic.
Honestly, the question may have been a bit more on the programming (generating lines) side, but I've always described programming as a lot like cleaning. You enter the room, figure out the nature of the mess (the interesting part) and come up with your strategy for solving it, then spend ages cleaning, sweeping or mopping up which is largely boring. Now you don't have to bother. Thanks, LLMs.
>"How is AI-assisted coding going for you professionally?"
For me personally - beautifully. Saves me a ton of time. Keep in mind however that I am an old fart who was originally scientist in physics, started programming with entering machine codes and designed electronics to facilitate research and after moving to Canada switching to programming completely. So I understand how everything works starting from the very bottom and am able to see good stuff from the bullshit in what I get from AI.
I however have no idea how would youngsters train their brains when they would not have any solid foundations. I think there is a danger of collapse with people having answers to all the questions but with zero ability to validate those and the AI itself degenerating into noise as a result of more and more being able to train off it's own results.
Or the AI will eventually have intelligence, motivation and agency.
Either way we are fucked.
For me the AI is just okay. Invaluable for personal projects but a little bit take or leave at work. It just makes too many little mistakes, i still have to babysit it, it's too much effort.
Sadly though my manager uses Claude for EVERYTHING and is completely careless with it. Hallucinations in performance reviews, hallucinations in documentation, trash tier PRs. He's so gung-ho that some of my peers are now also submitting Claude written PRs that they haven't even bothered to check whether they build correctly.
So the social aspect is very bad. I'm stuck correcting other people's slop and reading nonsense PRs a few times a week.
The biggest win for me has been cross-stack context switching. I maintain services in TypeScript, Python, and some Go, and the cost of switching between them used to be brutal - remembering idioms, library APIs, error handling patterns. Now I describe what I need and get idiomatic code in whichever language I'm in. That alone probably saves me 30-40 minutes on a typical day.
Where it consistently fails: anything involving the interaction between systems. If a bug spans a queue producer and its consumer, or the fix requires understanding how a frontend state change propagates through API calls to a cache invalidation - the model gives you a confident answer that addresses one layer and quietly ignores the rest. You end up debugging its fix instead of the original issue.
My stack: Claude Code (Opus) for investigation and bug triage in a ~60k LOC codebase, Cursor for greenfield work. Dropped autocomplete entirely after a month - it interrupted my thinking more than it helped.
I use Claude Code (CLI) daily for infrastructure work — Docker configs, shell scripts, API development, config management.
What works: I stay in the driver's seat. I own the architecture, make the decisions, validate everything. But I don't need a team to execute — Claude does the implementation. I went from being a solo dev limited by time to running a complex project (multi-agent system, Docker, Synology integration, PHP API) that would normally need 2-3 people.
The key is a good CLAUDE.md file with strict rules, and pushing Claude to think ahead and propose multiple options instead of just doing the first thing that comes to mind. Claude is also surprisingly powerful for audits — security audits, config audits, log analysis.
What doesn't work: it confidently generates plausible-looking code that's subtly wrong. Never trust it on things you can't verify. It also over-engineers everything if you don't rein it in.
The biggest shift: I went from "write code" to "review and direct code." Not sure it's making me a better engineer, but it's making me a more effective one. It extends me.