Google CEO says more than a quarter of the company's new code is created by AI

92 points | by S0y 15 hours ago

115 comments

ntulpule 13 hours ago
Hi, I lead the teams responsible for our internal developer tools, including AI features. We work very closely with Google DeepMind to adapt Gemini models for Google-scale coding and other Software Engineering usecases. Google has a unique, massive monorepo which poses a lot of fun challenges when it comes to deploying AI capabilities at scale.
1. We take a lot of care to make sure the AI recommendations are safe and have a high quality bar (regular monitoring, code provenance tracking, adversarial testing, and more).
2. We also do regular A/B tests and randomized control trials to ensure these features are improving SWE productivity and throughput.
3. We see similar efficiencies across all programming languages and frameworks used internally at Google and engineers across all tenure and experience cohorts show similar gain in productivity.
You can read more on our approach here:
https://research.google/blog/ai-in-software-engineering-at-g...
[-]
- reverius42 13 hours ago
  To me the most interesting part of this is the claim that you can accurately and meaningfully measure software engineering productivity.
  [-]
  - ozim 10 hours ago
    You can - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev.
    For teams you can measure meaningful outcomes and improve team metrics.
    You shouldn’t really compare teams but it also is possible if you know what teams are doing.
    If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.
    [-]
    - deely3 5 hours ago
      > For teams you can measure meaningful outcomes and improve team metrics.
      How? Which metrics?
      [-]
      - ozim 5 hours ago
        That is what we pay managers -to figure out- for. They should find out which and how by knowing the team, familiarity with domain knowledge, understanding company dynamics, understanding customer, understanding market dynamics.
  - UncleMeat 8 hours ago
    At scale you can do this in a bunch of interesting ways. For example, you could measure "amount of time between opening a crash log and writing the first character of a new change" across 10,000s of engineers. Yes, each individual data point is highly messy. Alice might start coding as a means of investigation. Bob might like to think about the crash over dinner. Carol might get a really hard bug while David gets a really easy one. But at scale you can see how changes in the tools change this metric.
    None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.
  - valval 11 hours ago
    You can come up with measures for it and then watch them, that’s for sure.
    [-]
    - lr1970 4 hours ago
      when metric becomes the target it ceases to be a good metric. when discovered how it works developers will type the first character immediately after opening the log.
      edit: typo
- hitradostava 10 hours ago
  I'm continually surprised by the amount of negativity that accompanies these sort of statements. The direction of travel is very clear - LLM based systems will be writing more and more code at all companies.
  I don't think this is a bad thing - if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss and everyone has examples of LLMs producing buggy or ridiculous code. But once the tooling improves to:
  1. align produced code better to existing patterns and architecture 2. fix the feedback loop - with TDD, other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
  Then we will definitely start seeing more and more code produced by LLMs. Don't look at the state of the art not, look at the direction of travel.
  [-]
  - latexr 7 hours ago
    > if this can be accompanied by an increase in software quality
    That’s a huge “if”, and by your own admission not what’s happening now.
    > other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.
    What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.
    > Then we will definitely start seeing more and more code produced by LLMs.
    We’re already there. And there’s a lot of bad code being pumped out. Which will in turn be fed back to the LLMs.
    > Don't look at the state of the art not, look at the direction of travel.
    That’s what leads to the eternal “in five years” which eventually sinks everyone’s trust.
- LinuxBender 5 hours ago
  Is AI ready to crawl through all open source and find / fix all the potential security bugs or all bugs for that matter? If so will that become a commercial service or a free service?
  Will AI be able to detect bugs and back doors that require multiple pieces of code working together rather than being in a single piece of code? Humans have a hard time with this.
  - Hypothetical Example: Authentication bugs in sshd that requires a flaw in systemd which then requires a flaw in udev or nss or PAM or some underlying library ... but looking at each individual library or daemon there are no bugs that a professional penetration testing organization such as the NCC group or Google's Project Zero would find. In other words, will AI soon be able to find more complex bugs in a year than Tavis has found in his career and will they start to compete with one another and start finding all the state sponsored complex bugs and then ultimately be able to create a map that suggests a common set of developers that may need to be notified? Will there be a table that logs where AI found things that professional human penetration testers could not?
- mysterydip 4 hours ago
  I assume the amount of monitoring effort is less than the amount of effort that would be required to replicate the AI generated code by humans, but do you have numbers on what that ROI looks like? Is it more like 10% or 200%?
- fhdsgbbcaA 12 hours ago
  I’ve been thinking a lot lately about how an LLM trained in really high quality code would perform.
  I’m far from impressed with the output of GPT/Claude, all they’ve done is weight against stack overflow - which is still low quality code relative to Google.
  What is probability Google makes this a real product, or is it too likely to autocomplete trade secrets?
- gamesetmath 12 hours ago
  [flagged]
- pixxel 10 hours ago
  [flagged]
imaginebit 15 hours ago
I think he's trying to promote AI, somehow raises questions about thrir code quality among some
[-]
- dietr1ch 14 hours ago
  I think it just shows how much noise there is in coding. Code gets reviewed anyways (although review quality was going down rapidly the more PMs where added to the team)
  Most of the code must be what could be snippets (opening files and handling errors with absl::, and moving data from proto to proto). One thing that doesn't help here, is that when writing for many engineers on different teams to read, spelling out simple code instead of depending on too many abstractions seems to be preferred by most teams.
  I guess that LLMs do provide smarter snippets that I don't need to fill out in detail, and when it understands types and whether things compile it gets quite good and "smart" when it comes to write down boilerplate.
nosbo 13 hours ago
I don't write code as I'm a sysadmin. Mostly just scripts. But is this like saying intellisense writes 25% of my code? Because I use autocomplete to shortcut stuff or to create a for loop to fill with things I want to do.
[-]
- n_ary 13 hours ago
  You just made it less attractive to the target corps who are to buy this product from Google. Saying, intellisense means corps already have license of various of these and some are even mostly free. Saying AI generate our 25% code sounds more attractive to corps, because it feels like something new and novel and you can imagine laying off 25% of the personnel and justify buying this product from Google.
  When someone who uses a product says it, there is a 50% chance of it being true, but when someone far away from the user says it, it is 100% promotion of product and setup for trust building for a future sale.
ausbah 15 hours ago
i would be may more impressed if LLMs could do code compression. more code == more things that can break, and when llms can generate boatloads of it with a click you can imagine what might happen
[-]
- Scene_Cast2 15 hours ago
  This actually sparked an idea for me. Could code complexity be measured as cumulative entropy as measured by running LLM token predictions on a codebase? Notably, verbose boilerplate would be pretty low entropy, and straightforward code should be decently low as well.
  [-]
  - jeffparsons 15 hours ago
    Not quite, I think. Some kinds of redundancy are good, and some are bad. Good redundancy tends to reduce mistakes rather than introduce them. E.g. there's lots of redundancy in natural languages, and it helps resolve ambiguity and fill in blanks or corruption if you didn't hear something properly. Similarly, a lot of "entropy" in code could be reduced by shortening names, deleting types, etc., but all those things were helping to clarify intent to other humans, thereby reducing mistakes. But some is copy+paste of rules that should be enforce in one place. Teaching a computer to understand the difference is... hard.
    Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.
    Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.
    [-]
    - 8note 14 hours ago
      Interpreting this comment, it would predict low complexity for code copied unnecessarily.
      I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?
      Over time, you'd still see copies being changed by themselves show up as increased entropy
  - malfist 2 hours ago
    Code complexity can already be measured deterministically with cyclomatic complexity. No need to use an AI fuzzy logic at this. Especially when they're bad at math.
- ks2048 15 hours ago
  I agree. It seems like counting lines of generated code is like counting bytes/instructions of compiled code - who cares? If “code” becomes prompts, then AI should lead to much smaller code than before.
  I’m aware that the difference is that AI-generated code can be read and modified by humans. But that quantity is bad because humans have to understand it to read or modify it.
  [-]
  - TZubiri 14 hours ago
    What's that line about accounting for lines of code on the wrong side of the balance sheet?
  - latexr 7 hours ago
    > If “code” becomes prompts, then AI should lead to much smaller code than before.
    What’s the point of shorter code if you can’t trust it to do what it’s supposed to?
    I’ll take 20 lines of code that do what they should consistently over 1 line that may or may not do the task depending on the direction of the wind.
  - 15 hours ago
    [deleted]
- AlexandrB 15 hours ago
  Exactly this. Code is a liability, if you can do the same thing with less code you're often better off.
  [-]
  - EasyMark 14 hours ago
    Not if it’s already stable and has been running for years. Legacy doesn’t necessarily mean “need replacement because of technical debt”. I’ve seen lots of people want to replace code that has been running basically bug free for years because “there are better coding styles and practices now”
- 8note 14 hours ago
  How would it know which edge cases are being useful and which ones aren't?
  I understand more code as being more edge cases
- asah 15 hours ago
  meh - the LLM code I'm seeing isn't particularly more verbose. And as others have said, if you want tighter code, just add that to the prompt.
  fun story: today I had an LLM write me a non-trivial perl one-liner. It tried to be verbose but I insisted and it gave me one tight line.
S0y 15 hours ago
https://archive.is/X43PU
pixelat3d 12 hours ago
Sooo... is this why Google sucks now?
mergisi 8 hours ago
I've been following the integration of AI into coding with great interest. It's remarkable to see that over a quarter of Google's new code is now AI-generated. In line with this trend, I've been working on a tool called AI2sql https://ai2sql.io/ that uses AI to convert natural language into SQL queries. It's been helpful in streamlining database interactions without needing deep SQL expertise. I'm curious—has anyone else here been leveraging AI tools to assist with code generation or simplify complex programming tasks?
oglop an hour ago
No surprise. I give my career about 2 years before I’m useless.
mjbale116 15 hours ago
If you manage to convince software engineers that you are doing them a favour by employing them then they will approach any workplace negotiations with a specific mindset which will make them grab the first number it gets thrown to them.
These statements are brilliant.
[-]
- 15 hours ago
  [deleted]
ChrisArchitect 13 hours ago
Related:
Alphabet ($GOOG) 2024 Q3 earnings release
https://news.ycombinator.com/item?id=41988811
rcarmo 10 hours ago
There is a running gag among my friends using Google Chat (or whatever their corporate IM tool is now called) that this explains a lot of what they’re experiencing while using it…
15 hours ago
[deleted]
nine_zeros 14 hours ago
Writing more code means more needs to be maintained and they are cleverly hiding that fact. Software is a lot more like complex plumbing than people want to admit:
More lines == more shit to maintain. Complex lines == the shit is unmanageable.
But wall street investors love simplistic narratives such as More X == More revenue. So here we are. Pretty clever marketing imo.
jrockway 15 hours ago
When I was there, way more than 25% of the code was copying one proto into another proto, or so people complained. What sort of memes are people making now that this task has been automated?
[-]
- hn_throwaway_99 15 hours ago
  I am very interested in how this 25% number is calculated, and if it's a lot of boilerplate that in the past would have been just been big copy-paste jobs like a lot of protobuffers work. Would be curious if any Googlers could comment.
  Not that I'm really discounting the value of AI here. For example, I've found a ton of value and saved time getting AI to write CDKTF (basically, Terraform in Typescript) config scripts for me. I don't write Terraform that often, there are a ton of options I always forget, etc. So asking ChatGPT to write a Terraform config for, say, a new scheduled task for example saves me from a lot of manual lookup.
  But at the same time, the AI isn't really writing the complicated logic pieces for me. I think that comes down to the fact that when I do need to write complicated logic, I'm a decent enough programmer that it's probably faster for me to write it out in a high-level programming language than write it in English first.
- dietr1ch 15 hours ago
  I miss old memegen, but it got ruined by HR :/
  [-]
  - rcarmo 10 hours ago
    I am reliably told that it is alive and well, even if it’s changed a bit.
- 15 hours ago
  [deleted]
- 15 hours ago
  [deleted]
kev009 15 hours ago
I would hope a CEO, especially a technical one, would have enough sense to couple that statement to some useful business metric, because in isolation it might be announcement of public humiliation.
[-]
- dmix 15 hours ago
  The elitism of programmers who think the boilerplate code they write for 25% of the job, that's already been written before by 1000 other people before, is in fact a valuable use of company time to write by hand again.
  IMO it's only really an issue if a competent human wasn't involved in the process, basically a person who could have written it if needed, then they do the work connecting it to the useful stuff, and have appropriate QA/testing in place...the latter often taking far more effort than the actual writing-the-code time itself, even when a human does it.
  [-]
  - marcosdumay 15 hours ago
    If 25% of your code is boilerplate, you have a serious architectural problem.
    That said, I've seen even higher ratios. But never in any place that survived for long.
    [-]
    - TheNewsIsHere 9 hours ago
      To add: it’s been my experience that it’s the company that thinks the boilerplate code is some special, secret, proprietary thing that no other business could possibly have produced.
      Not the developer who has written the same effective stanza 10 times before.
    - hn_throwaway_99 15 hours ago
      Depends on how you define "boilerplate". E.g. Terraform configs count for a significant number of the total lines in one of my repos. It's not really "boilerplate" in that it's not the exact same everywhere, but it is boilerplate in the since that setting up, say, a pretty standard Cloud SQL instance can take many, many lines of code just because there are so many config options.
      [-]
      - marcosdumay an hour ago
        Terraform is verbose.
        It's only boilerplate if you write it again to set almost the same thing again. What, granted, if you are writing bare terraform config, it's probably both.
        But on either case, if your terraform config is repetitive and a large part of the code on an entire thing (not a repo, repos are arbitraty divisions, maybe "product", but it's also a bad name). Than that thing is certainly close to useless.
    - 8note 14 hours ago
      Is it though? It seems to me like a team ownership boundary question rather than an architecture question.
      Architecturally, it sounds like different architecture components map somewhere close to 1:1 to teams, rather than teams hacking components to be closer coupled to each other because they have the same ownership.
      I'd see too much boilerplate as being a organization/management org issue rather than a code architecture issue
    - cryptoz 15 hours ago
      Android mobile development has gotten so …architectured that I would guess most apps have a much higher rate of “boilerplate” than you’d hope for.
      Everything is getting forced into a scalable, general purpose way, that most apps have to add a ridiculous amount of boilerplate.
    - dmix 15 hours ago
      You're probably thinking of just raw codebases, your company source code repo. Programmers do far, far more boilerplate stuff than raw code they commit with git. Debugging, data processing, system scripts, writing SQL queries, etc.
      Combine that with generic functions, framework boilerplate, OS/browser stuff, or explicit x-y-z code then your 'boilerplate' (ie repetitive, easily reproducible) easily gets to 25% of code you're programmers write every month. If your job is >75% pure human cognition problem solving you're probably in a higher tier of jobs than the vast majority of programmers on the planet.
  - kev009 14 hours ago
    Doing the same thing but faster might just mean you are masturbating more furiously. Show me the money, especially from a CEO.
  - mistrial9 15 hours ago
    you probably underestimate the endless miles of verbose code that are possible, by human or machine but especially by machine.
    [-]
    - 14 hours ago
      [deleted]
- dyauspitr 15 hours ago
  Or a statement of pride that the intelligence they created is capable of lofty tasks.
- 15 hours ago
  [deleted]
joeevans1000 15 hours ago
I read these threads and the usual 'I have to fix the AI code for longer than it would have taken to write it from scratch' and can't help but feel folks are truly trying to downplay what is going to eat the software industry alive.
[-]
- 15 hours ago
  [deleted]
tylerchilds 15 hours ago
if the golden rule is that code is a liability, what does this headline imply?
[-]
- eddd-ddde 13 hours ago
  The code would be getting written anyways, its an invariant. The difference is less time wasted typing keys (albeit small amount of time) and more importantly (in my experience) it helps A LOT for discoverability.
  With g3's immense amount of context, LLMs can vastly help you discover how other people are using existing libraries.
  [-]
  - tylerchilds 6 hours ago
    my experience dabbling with the ai and code is that it is terrible at coming up with new stuff unless it already exists
    in regards to how others are using libraries, that’s where the technology will excel— re-writing code. once it has a stable AST to work with, the mathematical equation it is solving is a refactor.
    until it has that AST that solves the business need, the game is just prompt spaghetti until it hits altitude to be able to refactor.
- JimDabell 9 hours ago
  Nothing at all. The headline talks about the proportion of code written by AI. Contrary to what a lot of comments here are assuming, it does not say that the volume of code written has increased.
  Google could be writing the same amount of code with fewer developers (they have had multiple layoffs lately), or their developers could be focusing more of their time and attention on the code they do write.
- danielmarkbruce 15 hours ago
  I'm sure google won't pay you money to take all their code off their hands.
  [-]
  - AlexandrB 15 hours ago
    But they would pay me money to audit it for security.
    [-]
    - danielmarkbruce 14 hours ago
      yup, you can get paid all kinds of money to fix/guard/check billion/trillion dollar assets..
- 15 hours ago
  [deleted]
croes 15 hours ago
Related?
> New tool bypasses Google Chrome’s new cookie encryption system
https://news.ycombinator.com/item?id=41988648
[-]
- 15 hours ago
  [deleted]
an_d_rew 15 hours ago
Huh.
That may explain why google search has, in the past couple of months, become so unusable for me that I switched (happily) to kagi.
[-]
- twarge 15 hours ago
  Which uses Google results?
- 15 hours ago
  [deleted]
hipadev23 15 hours ago
Google is now mass-producing techdebt at rates not seen since Martin Fowler’s first design pattern blogposts.
[-]
- joeevans1000 15 hours ago
  Not really technical debt when you will be able to regenerate 20K lines of code in a minute then QA and deploy it automatically.
  [-]
  - kibwen 13 hours ago
    So a fresh, new ledger of technical debt every morning, impossible to ever pay off?
  - 1attice 13 hours ago
    Assuming, of course:
    - You know which 20K lines need changing - You have perfect QA - Nothing ever goes wrong in deployment.
    I think there's a tendency in our industry to only take the hypotenuse of curves at the steepest point
    [-]
    - TheNewsIsHere 9 hours ago
      That is a fantastic way to put it. I’d argue that you’ve described a bubble, which fits perfectly with the topic and where _most_ of it will eventually end up.
- 15 hours ago
  [deleted]
Tier3r 14 hours ago
Google is getting enshittified. It's already visible in many small ways. I was just using Google maps and in the route they called X (bus) Interchange as X International. I can only assume this happened because they are using AI to summarise routes now. Why in the world are they doing that? They have exact location names available.
[-]
- 13 hours ago
  [deleted]
FactKnower69 15 hours ago
[flagged]
[-]
- eob 15 hours ago
  So GCS customers will trust their codegen product. (Engineers aren’t the buyer; corp suite is)
- hn_throwaway_99 15 hours ago
  I don't understand why you think this at all. Care to explain?
- dartharva 15 hours ago
  Why? Especially when said AI helpers are a part of what the company itself is selling?
- joeevans1000 15 hours ago
  These companies are competing to be the next codegen service provider.
- foota 15 hours ago
  Translation: They'd love to lay off all the engineers.
  [-]
  - sfmz 15 hours ago
    We should watch for dev layoffs as a sign/signal of the impact of generated code. I remember reading about an anime shop that fired 80% of its illustrators due to ai-images.
  - TheNewsIsHere 9 hours ago
    By some intuitive measures, it’s surprising they have very many still writing their code. Google’s product quality isn’t what it once was. There is no amount of AI accelerators and energy they can burn through to fix that without humans.
- lesuorac 15 hours ago
  Well, the article has a paywall so it might go into this.
  I'm not sure this stat is as important as people point it out to be. If I start of `for` and the AI auto-completes `for(int i=0; i<args.length; i++) {` then a lot more than 25% of the code is AI written but it's also not significant. I could've figured out how to write the for-loop and its also not a meaningful amount of time saved because most of the time is figuring out and testing which the AI doesn't do.
- dyauspitr 15 hours ago
  I don’t think the public cares wether their code is written by machines or real people as long as the product works.
  [-]
  - Nullabillity 15 hours ago
    Just today, Google Calendar asked me whether I wanted the "easy" or "depressed" colour scheme.
    [-]
    - mattigames 14 hours ago
      It's for when you have an upcoming funeral, the calendar it's just trying to dress appropriately.
    - Mistletoe 14 hours ago
      Ironically, your comment brightened my day.
microtherion 15 hours ago
[flagged]
[-]
- 15 hours ago
  [deleted]
Tiktaalik 15 hours ago
[flagged]
[-]
- 15 hours ago
  [deleted]
calmbonsai 15 hours ago
[flagged]
[-]
- 13 hours ago
  [deleted]
pyuser583 15 hours ago
[flagged]
[-]
- YPPH 15 hours ago
  Actually 0%, assembly language is assembled to machine code, not compiled.
  [-]
  - ndesaulniers 14 hours ago
    Inline asm has to go through the compiler to get wired up by the register allocator.
bakugo 15 hours ago
[flagged]
evbogue 15 hours ago
I'd be turning off the autocomplete in my IDE if I was at Google. Seems to double as a keylogger.
[-]
- 15 hours ago
  [deleted]
1oooqooq 13 hours ago
this only means employees sign up to use new toys and they are paying enough seats for all employees.
it's like companies paying all those todolist and tutorial apps left running on aws ec2 instances in 2007ish.
I'd be worried if i were a google investor. lol.
[-]
- fragmede 13 hours ago
  I'm not sure I get your point. Google created Gemini and whatever internal LLM their employees are using for code generation. Who are they paying, and for what seats? Not Microsoft or OpenAI or Anthropic...
ultra_nick 15 hours ago
Why work at big businesses anymore? Let's just create more startups.
[-]
- IAmGraydon 14 hours ago
  Risk appetite.
- 13 hours ago
  [deleted]