It provides a git like pull/push workflow to edit sheets/docs/slides. `pull` converts the google file into a local folder with agent friendly files. For example, a google sheet becomes a folder with a .tsv, a formula.json and so on. The agent simply edits these files and `push`es the changes. Similarly, a google doc becomes an XML file that is pure content. The agent edits it and calls push - the tool figures out the right batchUpdate API calls to bring the document in sync.
None of the existing tools allow you to edit documents. Invoking batchUpdate directly is error prone and token inefficient. Extrasuite solves these issues.
In addition, Extrasuite also uses a unique service token that is 1:1 mapped to the user. This means that edits show up as "Alice's agent" in google drive version history. This is secure - agents can only access the specific files or folders you explicitly share with the agent.
This is still very much alpha - but we have been using this internally for our 100 member team. Google sheets, docs, forms and app scripts work great - all using the same pull/push metaphor. Google slides needs some work.
Excellent project! I see that the agent modifies the google docs using an interesting technique: convert doc to html, AI operates over the HTML and then diff the original html with ai-modified html, send the diff as batchUpdate to gdocs.
IMO, this is a better approach than the one used by Anthropic docx editing skill.
1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?
2. What happens if the document is a big book? How do you manage context when loading big documents?
PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.
We have been using something similar for editing Confluence pages. Download XML, edit, upload. It is very effective, much better than direct edit commands. It’s a great pattern.
You can use the Copilot CLI with the atlassian mcp to super easily edit/create confluence pages. After having the agent complete a meaningful amount of work, I have it go create a confluence page documenting what has been done. Super useful.
I'm afraid I can't easily share this, as we have embedded a lot of company-specific information in our setup, particularly for cross-linking between confluence/jira/zendesk and other systems. I can try explain it though, and then Claude Code is great at implementing these simple CLI tools and writing the skills.
We wrote CLIs for Confluence, Jira, and Zendesk, with skills to match. We use a simple OAuth flow for users to login (e.g., they would run jira login). Then confluence/jira/zendesk each have REST APIs to query pages/issues/tickets and submit changes, which is what our CLIs would use. Claude Code was exceptional at finding the documentation for these and implementing them. Only took a couple days to set these up and Claude Code is now remarkably good at loading the skills and using the CLIs. We use the skills to embed a lot of domain-specific information about projects, organisation of pages, conventions, standard workflows, etc.
Being able to embed company-specific links between services has been remarkably useful. For example, we look for specific patterns in pages like AIT-553 or zd124132 and then can provide richer cross-links to Jira or Zendesk that help agents navigate between services. This has made agents really efficient at finding information, and it makes them much more likely to actually read from multiple systems. Before we made changes like this, they would often rabbit-hole only looking at confluence pages, or only looking at jira issues, even when there was a lot of very relevant information in other systems.
My favourite is the confluence integration though, as I like to record a lot of worklog-style information in there that I would previously write down as markdown files. It's nicer to have these in Confluence as then they are accessible no matter what repo I am working in, what region I am working in, or what branch or feature I'm working on. I've been meaning to try to set something similar up for my personal projects using the new Obsidian CLI.
We have been doing something similar but it sounds like you have come further along this way of working. We (with help from Claude) have built a similar tool that you describe to interface with our task- and project management system, and use it together with the Gitlab and Github CLI tools to allow agents to read tickets, formulate a plan and create solutions and create MR/PR to the relevant repos. For most of our knowledge base we use Markdown but some of it is tied up in Confluence, that's why I have an interest in that part. And, some is even in workflows are in Google Docs which makes the OP tool interesting as well -- currently our tool output Markdown and we just "paste from markdown" into Gdocs. We might be able to revise and improve that too.
Thank you! Sounds like a fantastic setup. Are the claude code agents acting autonomously from any trigger conditions or is this all manual work with them? And how do you manage write permissions for documents amongst team members/agents, presumably multiple people have access to this system?
(Not OP, but have been looking into setting up a system for a similar use case)
Related, I often work with markdown docs (usually created via CLI agents like Claude Code) and need to collaborate with others in google docs, which is extremely markdown-unfriendly[1], so I built small quality-of-life CLI tools to convert Gdocs -> md and vice versa, called gdoc2md and md2gdoc:
Interesting, in my Arc browser, I just tried File -> open -> upload -> blah.md and it does seem to render fine. This exact thing did not work a few weeks ago, meaning the various header markers etc showed up as raw "##" etc, and I had to further select something like "open as new doc" to finally make it look good.
Obsidian has become almost an operating system for working with markdown. Its Live View / Edit mode is excellent (WYSIWYG) and its ability to accept pasted content and handle it appropriately is good and getting better. Its plugin/extension ecosystem is robust (and has a low barrier to entry), and now that it has a CLI I expect to see an acceleration of clever workflows and integrations.
No affiliation, just a very happy ~early adopter and daily user.
Wow, that's a strong opinion and harsh words that come across as really entitled, and probably unfair. From my PoV, they're a tiny, scrappy, transparent and likeable company who built and maintain a fantastic software application that radically improved ~everything about my daily workflow and PKM. I get more value out of Obsidian in a day than most other apps in their entire lifespan. The core app is free! They have to eat. I'd probably throw $ at them even if they didn't charge a few bucks / month for Sync. (Which works flawlessly.) Sure it'd be cool if you could self-host their Sync module -- but many Obsidian users use other DIY approaches for sync; in the end it's markdown files on a local disk, do with it what you will.
interesting we have a very similar internal flow - we like working in markdown but our customers want to leave feedback in Google docs, so we also have an md -> gdoc tool. We don't do the reverse as we ask them to only leave comments/suggested changes and we apply those directly to the markdown and re-export.
I ran into similar issues as you for the image handling, and the work around I use is to use pandoc to convert to docx as a first step and then import that as a Google Doc using the API, as Google Docs seems to handle docx much better than markdown from what I've seen.
This is really interesting: "Humans hate writing nested JSON in the terminal. Agents prefer it." Are others seeing the same thing? I've just moved away from json-default because agents were always using jq to convert it to what I could have been producing anyway.
Really interesting. I was thinking about something similar regarding the shape of code. I have no qualms recommending my agents take static analysis to the extreme, though it would cumbersome for most people.
Generating a good cli isn't all that hard for agentic coding tools. When you do it manually it's highly repetitive work. But all you are doing is low level plumbing. Given some parsed arguments, call a function, return the result (with some formatting, prettying, etc.). In the end it's just a facade for an API, library, or whatever else you want to have a cli for. Easy to write. Easy to test. But manually going through your API resource by resource, parameter by parameter, etc. takes a long time. An LLM just blazes through that in a few minutes. Generate some tests, tweak as needed, and you are good to go.
I did a few CLIs with codex in the last few weeks. I do simple ops with this stuff. I've had a few use cases for new features where previously I would have had to build some kind of quick and dirty admin UI just to use and test a new API feature before being able to integrate it into our product. With a generated cli, I can just play with it from the command line. Or make codex do that for me.
A good cli with a modern command line argument parser, well documented options, bash/zsh auto complete, pretty colors, etc. is generally nice to have. I mapped resources to commands and sub commands, made it add parameters with sensible defaults or optional ones. Then I got lazy and just asked it what else it thought it was missing, it made some suggestions and I gave it the thumbs up and it all got added. I even generated a simple interactive TUI at some point. Because why not? I also made it generate a md skill file explaining how to use the cli that you can just drop in your skills directory.
> gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically
You're not exactly describing rocket science. This is basically how websites work, there's never been anything stopping anyone from doing dynamic UI in TUIs except the fact that TUI frameworks were dog poop until a few years ago (and there was no Windows Terminal, so no Windows support). Try doing that in ncurses instead of Rataui or whatever, it's horrendous
Generally, this disclaimer is required for products that are released under the "Google" name but without any kind of support guarantees for enterprise customers.
That or it's a personal project that IARC decided could live in the workspace project.
Google operates across so many verticals that it's difficult to argue a side project is outside the scope of Google’s business and therefore Google could argue it has copyright over the work. To make it easier for engineers to keep contributing to open source, there’s a fairly straightforward path to release code through a Google-owned repository (if you look at github.com/google it is full of personal projects alongside official ones).
There is an official process where an engineer can apply to a committee to have Google waive any copyright claim. That requires additional work so if your goal is simply to publish the code as open source and you do not mind it living under the Google org, using the Google repo path is usually much faster.
Disclaimer: ex-googler, not a lawyer, not arguing whether or not the situation with copyright assignment is legally enforceable or not/good or bad/etc.
I think an official project from Google would be hosted under https://github.com/google, a GitHub Org which contains 2,800 repositories and has more than 500 Google employees as member.
googleworkspace/cli appears to be more of a hobby project developed by a single Google employee.
But at least they are under the Google organization. Thing is anyone could create an organization, name it something like "googlesomething", use Google logos, and design it in a way that some users might believe it has an official connection.
I think so, but it could be enough for someone to create such an organization, share it on HN for malicious purposes, such as infecting devices, and have it taken down only afterward. I'm not saying that's what happened here, but it does illustrate a potential attack vector.
I was excited to see this but all of that went away when I realized you need to create an app in GCP to use it. Can't really expect non technical users to set this up across the company.
Can someone explain to me, why Google can't (or does not want to) implement the same auth flow that any other SaaS company uses:
# API Keys in Settings
1. Go to Settings -> API Keys Page
2. Create Token (set scope and expiration date)
# OAuth flow
1. `gws login` shows url to visit
2. Login with Google profile & select data you want to share
3. Redirect to localhost page confirms authentication
I get that I need to configure Project and OAuth screens if I want to develop an Applications for other users, that uses GCP services. This is fine. But I am trying to access my own data over a (/another) HTTP API. This should not be hard.
github? I just do some click here, click there, copy paste and gh cli is ready.
For google I need PhD to setup any kind of API access to my own data. And it frequently blocks you, because you can setup as a test product, add test accounts (but it can't be owner account (WTF?)) etc.
I gave up on using a google calendar cli project because of all that lack of normal UX.
UX for google APIs looks like it was designed by accountant.
gws auth setup looks promising, but it won't work yet for personal accounts.
Same story here, I installed it and ran `gwc auth setup` only to find I needed to install a `gloud` CLI by hand. That led me to this link with install instructions: https://cloud.google.com/sdk/docs/install. Unmistakeable Google DX strikes again.
God, getting this set up is frustrating. I've spent 45 minutes trying to get this to work, just following their defaults the whole way through.
Multiple errors and issues along the way, now I'm on `gws auth login`, and trying to pick the oAuth scopes. I go ahead and trust their defaults and select `recommended`, only to get a warning that this is too many scopes and may error out (then why is this the recommended setting??), and then yeah, it errors out when trying to authenticate in the browser.
The error tells me I need to verify my app, so I go to the app settings in my cloud console and try to verify and there's no streamlined way to do this. It seems the intended approach is for me to manually add, one by one, each of the 85 scopes that are on the "recommended" list, and then go through the actual verification.
Have the people that built and released this actually tried to install and run this, just a single time, purely following their own happy path?
Similar frustrations. I was only able auth using some Google app I created for an old project years ago that happened to have the right bits.
It wild that this process is still so challenging. There's got to be some safe streamlined way that sets up an app identity you own that can only use to access your own account.
My guess is that organizationally within Google, the developer app authorization process must have many teams involved in its implementation and many other outside stakeholders. A single unified team wouldn't responsible for this confusion and complexity. I get why... it's a huge source of bad actors. But there's got to be a better way.
I’ve been really unhappy with pretty much every Google product I’ve used except their consumer productivity tools — Gmail, Calendar, and Meet. Diving into Google Cloud has been extremely unsatisfactory
I ran a project for a company on Google Cloud a few years ago and enjoyed it once I got used to everything. I’d use it more now if they had better low end pricing to start projects there.
It’s a very different experience than AWS though and takes some getting used to.
Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)
I have "Advanced Protection" turned on, so I just can't use this at all, because my newly created Google Cloud GCP app isn't trusted (even though I own it and I'm requesting read-only scopes). What a mess.
Access blocked: [app name] is not approved by Advanced Protection. Error 400: policy_enforced
i had to do all that the last time i wanted to do a little js in my google sheets. when i saw their quick start required gcloud already set up, i decided not to bother trying this out. idk why google makes something that should take 15s (clicking “ok” in an oauth popup) take tens of minutes to hours of head scratching.
The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
They're not doing so here, but shipping a wasm-compiled binary with npm that uses node's WASI API is a really easy way to ship a cross-platform CLI utility. Just needs ~20 lines of JS wrapping it to set up the args and file system.
There's no such thing as a truly "cross-platform" build. Depending on what you use, you might have to target specific combinations of OS and processor architecture. That's actually why WASM (though they went with WASI) is a better choice; especially for libraries, since anyone can drop it into their environment without worrying about compatibility.
I found that strange as well. My guess is that `npm` is just the package manager people are most likely to already have installed and doing it this way makes it easy. They might think asking people to install Cargo is too much effort. Wonder if the pattern of using npm to install non-node tools will keep gaining traction.
It's still weird. Why not just use an effing install.sh script like everybody else? And don't tell me "security". Because after installation you will be running an unknown binary anyway.
A lot of people who are not web devs use it, that's what I see. I even saw some mainframe developers use npx to call some tool on some data dump.
Also, this is a web project anyway. Google Workspace is web based, so while there is a good chance that the users aren't web developers, it's a better chance that they have npm than anything else.
If you had to pick one package manager that was most likely installed across all the different user machines in the world, I'd say npm is a pretty good bet.
For many, installing something with npm is still easier. It chooses the right binary for your OS/architecture, puts it on your PATH, and streamlines upgrades.
Their Github releases provides the binaries, as well as a `curl ... | sh` install method and a guide to use github releases attestation which I liked.
To my knowledge NPM isn't shipped in _any_ major OSes. It's available to install on all, just like most package managers, but I'm not sure it's in the default distributions of macOS, Windows, or the major Linux distros?
pip might be but it was historically super inconsistent (at least in my experience). Is it `pip install`? `python3 -m pip install`? maybe `pip3 install`? Yeah ubuntu did a lot of damage to pip here. npm always worked because you had to install it and it didnt have a transition phase from python2 being in the OS by default.
system pip w/ sudo usually unleashes Zalgo, i’d rather curl | bash but npm is fine too. it’s just about meeting people where they’re at, and in the ai age many devs have npm
if you build for the web, no matter what your backend is (python, go, rust, java, c#), your frontend will almost certainly have some js, so likely you need npm.
python packaging / envs is solved now by uv. its not promising or used by people in the know like the last 2 trendy python package managers. i was a big time python hater since it was a pita to support as a devtools guy but now its trivial. uv just works, it won.
I'm not a python dev, but I see a bit of its ecosystem. How does uv compare with conda or venv? I thought JS had the monopoly on competing package managers.
> The install script checks the OS and Arch, and pulls the right Rust binary.
That's the arbitrary code execution at install time aspect of npm that developers should be extra wary of in this day and age. Saner node package managers like pnpm ignore the build script and you have to explicitly approve it on a case-by-case basis.
That said, you can execute code with build.rs with cargo too. Cargo is just not a build artifact distribution mechanism.
> NPM has become the de facto standard for installing any software these days, because it is present on every OS.
That's not remotely true. If there is a standard (which I wouldn't say there is), it's either docker or curl|bash. Nobody is out there using npm to install packages except web devs, this is absolutely ridiculous on Google's part.
I learned TS after a few years with JS. I thought having strict types was cool. Many of my colleagues with much more (JS) experience than me thought it was a hassle. Not sure if they meant the setup or TS or what but I always thought it was weird.
I think we all have been working on our own bespoke CLI tools.
MCPs are bloated, insecure token hogs.
CLIs are easy to write and you can cut it down to only what you need, saving thousands of tokens per prompt.
This is another one I'll add to my repertoire of claude CLIs to replace the MCP.
Claude Opus 4.6 couldn't figure out how to use it to write to a Google Sheet (something to do with escaping the !?) and fell back to calling the sheets API directly with gcloud auth.
> gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically.
What is the practical difference between a "discovery service"+API and an MCP server? Surely humans and LLMs are better off using discovery service"+API in all cases? What would be the benefit of MCP?
Nothing. MCP and HTTP APIs and CLI tools without the good parts. They lack the robustness of the OpenAPI spec, including security standardization, and are more complex to run than simple CLI utilities without any authentication.
I have done it many times, using the swagger.json as a "discovery service" and then having the agent utilize that API. A good OpenAPI spec was working perfectly fine for me all the way back when OpenAI introduced GPTs.
If we standardized on a discovery/ endpoint, or something like that, as a more compact description of the API to reduce token usage compared to consuming the somewhat bloated full OpenAPI spec, you would have everything you need right there.
The MCP side quest for AI has been one of the most annoying things in AI in recent years. Complete waste of time.
Benefit of mcp is that it exists and kinda works, and a lot of tools are available on it. I guess it's all about adoption. But inherently yeah it's a discovery service thingy. Google will never embrace mcp since it's invented by anthropic
I consider it a good first attempt, but indeed hope for a sort of mcp2.0
Right, but surely swagger/openapi has been providing robust API discovery for years? I just don't get what LLMs don't like about it (apart from it possibly using slightly more tokens than MCP)
MCP is like "this is what the API is about, figure it out". You can also change the server side pretty liberally and the agent will figure it out.
Swagger/OpenAPI is "this is EXACTLY what the API does, if you don't do it like this, you will fail". If you change something, things will start falling apart.
in a lot of sphere, MCP is still the hype. And it was the hype in even more sphere some month ago.
Because of FOMO a lot of higher up decided that "we must do a MCP to show that we're also part of the cool kids" and to give an answer to their even-higher-up about "What are you doing regarding IA ?"
The project has been approved, a lot of time has been sunk into the project, so nobody wants to admit that "hmmm actually now it's irrelevant our existing API + a skill.md is enough"
I've seen that in at least 4 companies my friends work in, so I would be surprised if it's not something like that here too.
On the contrary claude code, in my experience, has been perfectly able to use `stripe` `gh` and to construct on the fly a figma cli (once instructed to do it).
Is this basically a CLI version of [1]? If so, I'm glad Google is being forward thinking about how developers actually want to use their apps.
Better this than a Google dashboard, or slopped together third party libs. I know Google says they don't support it, but they'll probably support it better than someone outside of Google can support it.
Yikes, I wrote that? I hate it when people write cryptic replies like that.
What I meant was 'yes', Google Workspace CLI appears to quite similar to 'gogcli', the CLI written for OpenClaw. Both provide CLI access to a broad range of Google services for both workspace and regular gmail accounts.
GAM, on the other hand, is an admin tool, and strictly for Google Workspace accounts.
The fact that humans can use this feels like a side effect. The developer says it's "built for agents first" and "AI agents would be the primary consumers of every command, every flag, and every byte of output"[1].
I remember reading gog setup instructions, and thinking, "Create oauth app/client? That's bonkers." And as cool+useful as this project looks, it's quite a bit harder to get going, especially if you're not familiar with Google Console and OAuth (or not a dev).
Reading lots of comments about "MCP vs CLI" -- reminds me a bit of the "agent vs. script/app/rpa" debates. It's usually not one or the other, but rather, both.. or the right tool for the job (and that can shift over time).
Biggest complaint we have about MCP is bigger context windows and token spend. Tools do exist that address this. I have just one MCP endpoint with a half dozen tools behind it, including Gmail, Google Calendar, Docs, Github, Notion, and more. Uses tool search tool (ToolIQ) with tiny context footprint. Give it a whirl. https://venn.ai
They need something like this as it's hard and flaky to automate Google apps with AI. However, step 2 drops me to a fairly technical looking page where I have to configure Google Cloud. If they had a one click installer to automate Google Apps it would be an absolute killer use case for AI for me.
Would be nice if the MCP implemented the Streamable HTTP MCP spec instead of the CLI one. I know this is already a HTTP API, but making it available as an MCP server that clients like Joey[1] can consume easily over network would be nice.
The move to CLIs has been really interesting lately. They're easy for agents to use. They're composable with other shell tools. It's going to be interesting to see if mcp sticks around or if everything just moves to service specific CLIs.
This is made by Google Devrel. It's not going to break the TOS, but it could be abandoned. That happens frequently with devrel projects, since they're not actually tasked with or graded on engineering projects.
I think calling DevRel projects 'frequently abandoned' is blunt, but in my experience they are more like samples than production-owned libraries, so you should assume limited maintenance. Before relying on the Google Workspace CLI for automation inspect commit cadence, open issues, last release tag, number of contributors, and whether a product team or SDK maintainer is listed. If you need it in production pin to a release tag and vendor the code with go modules replace or npm shrinkwrap, add a thin adapter so swapping implementations is trivial, and run a couple of integration tests in GitHub Actions to catch regressions. I once had to fork a DevRel CLI that our on-call scripts depended on and maintaining that fork cost a weekend plus a steady trickle of small fixes, so now I put third party CLIs behind a tiny internal wrapper and keep the command surface minimal.
I am a Developer Relations Engineer at Google. Currently I am on the Google Workspace DevRel team and was on the Google Maps Platform before that. Previously I worked at Descartes Labs and the US Geological Survey.
How to expose my product suite's API to AI has been a roller coster ride. First it was tool calling hooks, then MCP, then later folks found out AI is better at coding so MCPs suddenly became code-mode, then people realized skills are better at context and eventually now Google has launched cli approach.
Remember this repo is not an agent. It's just a cli tool to operate over gsuite documents that happens to have an MCP command and a bunch of skills prebundled.
That's a new one. I guess the hope is agents are good at navigating cli and it also democratizes the ecosystem to be used by any agent as opposed to Microsoft (which only allows Copilot to work in its ecosystem)
I've been using `gog` but I'd rather have an agent-first thing. I don't want a big bad MCP that occupies all context all the time. I need my claw to be aware on how to edit things. As it so happens, right now `gog` works. But I'm eager to see how this develops.
This reminds me a bit of how the GitHub CLI evolved into a foundation for automation and tooling. Do you see the Google Workspace CLI primarily as something for humans using the terminal, or more as a stable interface that automation and AI agents can build on?
Are integration vendors like Pipedream in trouble now that every company is pushing out MCP servers and CLIs to ride the AI craze? After the Twitter and Reddit API troubles of prior years, I can't imagine any company would willingly bring down the walls of their gardens and give easy access to precious user data. I'm waiting for the rug pull
Neat. I've been running something very similar to this locally for a few months now. They introduced all their documentation into markdown recently. I still rely on discover API and lenient cloud project permissions, so maybe some gains there. Will compare note later.
Correct me if I'm wrong but the UX difficulty with the Google API ecosystem isn't resolved. It's the goddamn permissioning and service accounts. Great to have a CLI that every other minute says, "you can't do this" -- the CLI really needed to solve this to check my boxes.
They already have a HTTP API, but the real reason is that CLIs are emerging as the most ergonomic way for the current wave of AI agents to do stuff. There's a few benefits over APIs:
- No need to worry about transport layer stuff at all, including auth or headers. This is baked in, so saves context.
- They are self describing with --help and then nested --help commands, way better than trying to decipher an OpenAPI spec. You usually don't even need an agent skill, just call the --help and the LLM figures it out.
CLI is probably more reliable. Also, the ergonomics for the person setting up the machine for the AI are better. They can check to see if the command is working without screwing with curl. It's also possible a human might want to use the software / service they're paying for.
Forget the Gemini extension - Gemini CLI sucks. Forget the MCP - MCP is beyond dead. But for codex or claude cli this is a game changer. Next question is how programmatic have they made the sheets interface... because Gemini sucks at sheets.
Common for them. Even for projects on the main google GH org more often than not there'll be stuff like "not an official Google product" and "it is just code that happens to be owned by Google" in the README.
gcloud cli will probably also require you to make a Google Cloud project and stuff by clicking around their godforsaken webui. hopefully they streamlined that, it took me a long time to figure out when i wanted to write some JS in my spreadsheet
> requires setting up gcloud cli first, necessitates making a Google Cloud project
cmon google how come even your attempts at good ux start out with bad ux? let me just oauth with my regular google account like every other cli tool out there. gh cli, claude, codex - all are a simple “click ok” in the browser to log in. wtf.
and the slow setup - i need to make my own oauth app & keys??
EDIT: oh yeah and get my oath app verified all so i can use it with my own account
One of the very few good things from the AI race has been everyone finally publishing more data APIs out in the open, and making their tools usable via CLIs (or extensible APIs).
They aren't doing that though. At least not yet. It's generated from the discovery tool, which amounts to the spec of the existing API. If they want a high powered CLI they need to dig into the servers behind Google Workspace like they have when they've improved the web apps.
For all people have to say about Pete the openclaw guy he's been perhaps one of the most vocal voices about CLIs > MCPs (or maybe his is just the loudest?) and he also built a GSuite CLI that probably inspired this project.
I mean it's great that we get this, hopefully it can continue to be maintained and I'd love to see a push for similar stuff for other products and at other companies.
wow this will gel very well with my current project. Main hurdle i was facing was connecting with individual services via google oauth to get the data.
This is effectively why people want agents, right? To be able to bypass all the stuff companies "optimize", ie. the ad-filled websites. The inserted extra conversations that now literally get triggered BECAUSE you're chatting to someone else. The ...
In other words: this is the FANGs worst enemy. This is adblock * 1000, as far as consumers are concerned.
For the love of god, please google, give us personal access tokens and not this Oauth madness inside GCP. It is impossible to get that in any big enterprise.
the most annoying thing with Google Workspace is that you need super admin privilege to properly audit the environment programmatically, I believe because of the cloud-identity api.
I've built a few internal tools using the Workspace APIs, and while they are powerful, the rate limits on the Drive API can be brutal if you are doing bulk operations. Does this repository handle automatic backoff and retries, or do we need to wrap it ourselves?
I have been working on extrasuite (https://github.com/think41/extrasuite). This is like terraform, but for google drive files.
It provides a git like pull/push workflow to edit sheets/docs/slides. `pull` converts the google file into a local folder with agent friendly files. For example, a google sheet becomes a folder with a .tsv, a formula.json and so on. The agent simply edits these files and `push`es the changes. Similarly, a google doc becomes an XML file that is pure content. The agent edits it and calls push - the tool figures out the right batchUpdate API calls to bring the document in sync.
None of the existing tools allow you to edit documents. Invoking batchUpdate directly is error prone and token inefficient. Extrasuite solves these issues.
In addition, Extrasuite also uses a unique service token that is 1:1 mapped to the user. This means that edits show up as "Alice's agent" in google drive version history. This is secure - agents can only access the specific files or folders you explicitly share with the agent.
This is still very much alpha - but we have been using this internally for our 100 member team. Google sheets, docs, forms and app scripts work great - all using the same pull/push metaphor. Google slides needs some work.
Excellent project! I see that the agent modifies the google docs using an interesting technique: convert doc to html, AI operates over the HTML and then diff the original html with ai-modified html, send the diff as batchUpdate to gdocs.
IMO, this is a better approach than the one used by Anthropic docx editing skill.
1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?
2. What happens if the document is a big book? How do you manage context when loading big documents?
PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.
This works pretty well for me.
We have been using something similar for editing Confluence pages. Download XML, edit, upload. It is very effective, much better than direct edit commands. It’s a great pattern.
I would be very interested in this if you could share? Maintaining a Knowledge Base without a Git workflow is a pain currently.
You can use the Copilot CLI with the atlassian mcp to super easily edit/create confluence pages. After having the agent complete a meaningful amount of work, I have it go create a confluence page documenting what has been done. Super useful.
Edit the markdown using GitHub workflow. Then insert markup (pick markdown) into the confluence page.
Works wonderfully!
I'm afraid I can't easily share this, as we have embedded a lot of company-specific information in our setup, particularly for cross-linking between confluence/jira/zendesk and other systems. I can try explain it though, and then Claude Code is great at implementing these simple CLI tools and writing the skills.
We wrote CLIs for Confluence, Jira, and Zendesk, with skills to match. We use a simple OAuth flow for users to login (e.g., they would run jira login). Then confluence/jira/zendesk each have REST APIs to query pages/issues/tickets and submit changes, which is what our CLIs would use. Claude Code was exceptional at finding the documentation for these and implementing them. Only took a couple days to set these up and Claude Code is now remarkably good at loading the skills and using the CLIs. We use the skills to embed a lot of domain-specific information about projects, organisation of pages, conventions, standard workflows, etc.
Being able to embed company-specific links between services has been remarkably useful. For example, we look for specific patterns in pages like AIT-553 or zd124132 and then can provide richer cross-links to Jira or Zendesk that help agents navigate between services. This has made agents really efficient at finding information, and it makes them much more likely to actually read from multiple systems. Before we made changes like this, they would often rabbit-hole only looking at confluence pages, or only looking at jira issues, even when there was a lot of very relevant information in other systems.
My favourite is the confluence integration though, as I like to record a lot of worklog-style information in there that I would previously write down as markdown files. It's nicer to have these in Confluence as then they are accessible no matter what repo I am working in, what region I am working in, or what branch or feature I'm working on. I've been meaning to try to set something similar up for my personal projects using the new Obsidian CLI.
Thanks for the insights!
We have been doing something similar but it sounds like you have come further along this way of working. We (with help from Claude) have built a similar tool that you describe to interface with our task- and project management system, and use it together with the Gitlab and Github CLI tools to allow agents to read tickets, formulate a plan and create solutions and create MR/PR to the relevant repos. For most of our knowledge base we use Markdown but some of it is tied up in Confluence, that's why I have an interest in that part. And, some is even in workflows are in Google Docs which makes the OP tool interesting as well -- currently our tool output Markdown and we just "paste from markdown" into Gdocs. We might be able to revise and improve that too.
Thank you! Sounds like a fantastic setup. Are the claude code agents acting autonomously from any trigger conditions or is this all manual work with them? And how do you manage write permissions for documents amongst team members/agents, presumably multiple people have access to this system?
(Not OP, but have been looking into setting up a system for a similar use case)
I've found that usually works ok, but currently tends to timeout with the Atlassian MCP when trying to do updates on large Confluence pages: https://github.com/atlassian/atlassian-mcp-server/issues/59
Related, I often work with markdown docs (usually created via CLI agents like Claude Code) and need to collaborate with others in google docs, which is extremely markdown-unfriendly[1], so I built small quality-of-life CLI tools to convert Gdocs -> md and vice versa, called gdoc2md and md2gdoc:
https://pchalasani.github.io/claude-code-tools/integrations/...
They handle embedded images in both directions. There are similar gsheet2csv and csv2gsheet tools in the same repo.
Similar to the posted tool, there is a first time set up involving creating an app, that is documented above.
[1] in the sense there are multiple annoying clicks/steps to get a markdown doc to look good in Gdocs. You'd know the pain if you've tried it.
Paste from markdown (Chrome only) works _really_ well for me. What are the extra steps you’re running into?
Interesting, in my Arc browser, I just tried File -> open -> upload -> blah.md and it does seem to render fine. This exact thing did not work a few weeks ago, meaning the various header markers etc showed up as raw "##" etc, and I had to further select something like "open as new doc" to finally make it look good.
Right click > "Paste from Markdown" instead of just straight up pasting in
Images wouldn't work though, right? I'd be amazed if that worked. My CLI tools handle those.
Obsidian has become almost an operating system for working with markdown. Its Live View / Edit mode is excellent (WYSIWYG) and its ability to accept pasted content and handle it appropriately is good and getting better. Its plugin/extension ecosystem is robust (and has a low barrier to entry), and now that it has a CLI I expect to see an acceleration of clever workflows and integrations.
No affiliation, just a very happy ~early adopter and daily user.
BUT the main supported sync module is cloud only they wont let you self host for free which is really shitty and lame.
Wow, that's a strong opinion and harsh words that come across as really entitled, and probably unfair. From my PoV, they're a tiny, scrappy, transparent and likeable company who built and maintain a fantastic software application that radically improved ~everything about my daily workflow and PKM. I get more value out of Obsidian in a day than most other apps in their entire lifespan. The core app is free! They have to eat. I'd probably throw $ at them even if they didn't charge a few bucks / month for Sync. (Which works flawlessly.) Sure it'd be cool if you could self-host their Sync module -- but many Obsidian users use other DIY approaches for sync; in the end it's markdown files on a local disk, do with it what you will.
I’m intrigued by their recent CLI release as well. I’ll have to check out the markdown edit support too, thanks
interesting we have a very similar internal flow - we like working in markdown but our customers want to leave feedback in Google docs, so we also have an md -> gdoc tool. We don't do the reverse as we ask them to only leave comments/suggested changes and we apply those directly to the markdown and re-export.
I ran into similar issues as you for the image handling, and the work around I use is to use pandoc to convert to docx as a first step and then import that as a Google Doc using the API, as Google Docs seems to handle docx much better than markdown from what I've seen.
Interesting post from the main contributor about this (at least I assume it’s what he’s referencing) https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Thanks! Looks like he submitted it here, judging by the username:
You need to rewrite your CLI for AI agents - https://news.ycombinator.com/item?id=47252459.
I think that's pretty cool so I put the post in the SCP (https://news.ycombinator.com/item?id=26998308).
TIL Second Chance Pool, great idea
This is really interesting: "Humans hate writing nested JSON in the terminal. Agents prefer it." Are others seeing the same thing? I've just moved away from json-default because agents were always using jq to convert it to what I could have been producing anyway.
In my experience agents struggle with escape sequence nesting as much as humans do. IMHO that is one well-paved road to RCE via code injection.
Looks like I am hitting some Cloudflare Block when accessing this URL
Probably because he built his site for agents not humans
lol but it’s definitely happening. Some services are solely for llm consumption and human is not a welcomed customer.
Really interesting. I was thinking about something similar regarding the shape of code. I have no qualms recommending my agents take static analysis to the extreme, though it would cumbersome for most people.
No, we won’t be in fact doing that. Machine parsable, readable for other tools - yes.
Generating a good cli isn't all that hard for agentic coding tools. When you do it manually it's highly repetitive work. But all you are doing is low level plumbing. Given some parsed arguments, call a function, return the result (with some formatting, prettying, etc.). In the end it's just a facade for an API, library, or whatever else you want to have a cli for. Easy to write. Easy to test. But manually going through your API resource by resource, parameter by parameter, etc. takes a long time. An LLM just blazes through that in a few minutes. Generate some tests, tweak as needed, and you are good to go.
I did a few CLIs with codex in the last few weeks. I do simple ops with this stuff. I've had a few use cases for new features where previously I would have had to build some kind of quick and dirty admin UI just to use and test a new API feature before being able to integrate it into our product. With a generated cli, I can just play with it from the command line. Or make codex do that for me.
A good cli with a modern command line argument parser, well documented options, bash/zsh auto complete, pretty colors, etc. is generally nice to have. I mapped resources to commands and sub commands, made it add parameters with sensible defaults or optional ones. Then I got lazy and just asked it what else it thought it was missing, it made some suggestions and I gave it the thumbs up and it all got added. I even generated a simple interactive TUI at some point. Because why not? I also made it generate a md skill file explaining how to use the cli that you can just drop in your skills directory.
But manually going through your API resource by resource, parameter by parameter, etc. takes a long time.
This CLI dynamically generates itself at run time though
gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically
> gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically
You're not exactly describing rocket science. This is basically how websites work, there's never been anything stopping anyone from doing dynamic UI in TUIs except the fact that TUI frameworks were dog poop until a few years ago (and there was no Windows Terminal, so no Windows support). Try doing that in ncurses instead of Rataui or whatever, it's horrendous
> Disclaimer
> This is not an officially supported Google product.
Looked like an official Google Product on the first glance.
Generally, this disclaimer is required for products that are released under the "Google" name but without any kind of support guarantees for enterprise customers.
That or it's a personal project that IARC decided could live in the workspace project.
Disc: Former Googler
> but without any kind of support guarantees for enterprise customers
Also known as every single Google product
I'm still confused, @googleworkspace is not affiliated with Google?
Seems like it was made by Google employee: https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Google operates across so many verticals that it's difficult to argue a side project is outside the scope of Google’s business and therefore Google could argue it has copyright over the work. To make it easier for engineers to keep contributing to open source, there’s a fairly straightforward path to release code through a Google-owned repository (if you look at github.com/google it is full of personal projects alongside official ones).
There is an official process where an engineer can apply to a committee to have Google waive any copyright claim. That requires additional work so if your goal is simply to publish the code as open source and you do not mind it living under the Google org, using the Google repo path is usually much faster.
Disclaimer: ex-googler, not a lawyer, not arguing whether or not the situation with copyright assignment is legally enforceable or not/good or bad/etc.
I think an official project from Google would be hosted under https://github.com/google, a GitHub Org which contains 2,800 repositories and has more than 500 Google employees as member.
googleworkspace/cli appears to be more of a hobby project developed by a single Google employee.
Most projects under the "google" org will have exactly the same disclaimer about not being official Google products.
Crazy.
And this project uses "google" in its org, so I would assume it is offical or at least lawyers are running toward the owner with lawsuits.
But at least they are under the Google organization. Thing is anyone could create an organization, name it something like "googlesomething", use Google logos, and design it in a way that some users might believe it has an official connection.
Couldn't Google do a cease and desist for that kind of impersonation?
I think so, but it could be enough for someone to create such an organization, share it on HN for malicious purposes, such as infecting devices, and have it taken down only afterward. I'm not saying that's what happened here, but it does illustrate a potential attack vector.
It's by Google, but it's open source and comes with no SLAs.
Yeah that github name made my spider senses tingle, large scale credentials harvesting?
Also the use of the google logo.
Edit: Oh, I think this actually is an official account. Very confusing
What a shame Google Photos have no decent API or CLI. Photos could have been the best SAAS but changes in the API make it terribely unusable.
I wish I could use an API/CLI to query/geoquery my photos.
I was excited to see this but all of that went away when I realized you need to create an app in GCP to use it. Can't really expect non technical users to set this up across the company.
Can someone explain to me, why Google can't (or does not want to) implement the same auth flow that any other SaaS company uses:
# API Keys in Settings
1. Go to Settings -> API Keys Page
2. Create Token (set scope and expiration date)
# OAuth flow
1. `gws login` shows url to visit
2. Login with Google profile & select data you want to share
3. Redirect to localhost page confirms authentication
I get that I need to configure Project and OAuth screens if I want to develop an Applications for other users, that uses GCP services. This is fine. But I am trying to access my own data over a (/another) HTTP API. This should not be hard.
Can you name a service you think works like that?
Google have over a billion very non-technical users.
The friction of not having this in the account page that everyone has access too probably saves both parties lots of heartbreak.
github? I just do some click here, click there, copy paste and gh cli is ready.
For google I need PhD to setup any kind of API access to my own data. And it frequently blocks you, because you can setup as a test product, add test accounts (but it can't be owner account (WTF?)) etc.
I gave up on using a google calendar cli project because of all that lack of normal UX.
UX for google APIs looks like it was designed by accountant.
gws auth setup looks promising, but it won't work yet for personal accounts.
It's an un-invite. A hollow gesture.
Google's Gemini can read Google Docs directly.
They really don't want you to use another LLM product.
So they make the setup as difficult as possible.
Same story here, I installed it and ran `gwc auth setup` only to find I needed to install a `gloud` CLI by hand. That led me to this link with install instructions: https://cloud.google.com/sdk/docs/install. Unmistakeable Google DX strikes again.
https://www.supyagent.com
We’re trying to create a single unified cli to every service on the planet, and make sure that everything can be set up with 3 clicks
Yeah, still no way around this unfortunately.
God, getting this set up is frustrating. I've spent 45 minutes trying to get this to work, just following their defaults the whole way through.
Multiple errors and issues along the way, now I'm on `gws auth login`, and trying to pick the oAuth scopes. I go ahead and trust their defaults and select `recommended`, only to get a warning that this is too many scopes and may error out (then why is this the recommended setting??), and then yeah, it errors out when trying to authenticate in the browser.
The error tells me I need to verify my app, so I go to the app settings in my cloud console and try to verify and there's no streamlined way to do this. It seems the intended approach is for me to manually add, one by one, each of the 85 scopes that are on the "recommended" list, and then go through the actual verification.
Have the people that built and released this actually tried to install and run this, just a single time, purely following their own happy path?
Similar frustrations. I was only able auth using some Google app I created for an old project years ago that happened to have the right bits.
It wild that this process is still so challenging. There's got to be some safe streamlined way that sets up an app identity you own that can only use to access your own account.
My guess is that organizationally within Google, the developer app authorization process must have many teams involved in its implementation and many other outside stakeholders. A single unified team wouldn't responsible for this confusion and complexity. I get why... it's a huge source of bad actors. But there's got to be a better way.
I’ve been really unhappy with pretty much every Google product I’ve used except their consumer productivity tools — Gmail, Calendar, and Meet. Diving into Google Cloud has been extremely unsatisfactory
I ran a project for a company on Google Cloud a few years ago and enjoyed it once I got used to everything. I’d use it more now if they had better low end pricing to start projects there.
It’s a very different experience than AWS though and takes some getting used to.
Same. I was using Gemini and firebase for a work project and I was stunned how hard it was for me to use
I find https://github.com/steipete/gogcli a bit easier (but still confusing to setup)
Google Workspace API(s) keys and Roles was always confusing to me at so many levels .. and they just seem to keeping topping that confusion, no one is addressing the core (honestly not sure if that is even possible at this point)
I have "Advanced Protection" turned on, so I just can't use this at all, because my newly created Google Cloud GCP app isn't trusted (even though I own it and I'm requesting read-only scopes). What a mess.
had the same frustration trying to set up Google analytics MCP server: https://github.com/googleanalytics/google-analytics-mcp
getting the authentication to work is a real pain and it's basically preventing people access to an otherwise really good and useful MCP
Imagine a marketing person trying to set it up...
There are many gotchas in this process and unfortunately there is no easy way to deal with the OAuth setup.
i had to do all that the last time i wanted to do a little js in my google sheets. when i saw their quick start required gcloud already set up, i decided not to bother trying this out. idk why google makes something that should take 15s (clicking “ok” in an oauth popup) take tens of minutes to hours of head scratching.
I used Claude in chrome and Claude Code. It did everything for me.
Tried this out today and it feels half-baked unfortunately. I can't get auth working (https://github.com/googleworkspace/cli/issues/198).
The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
I'm curious why `npm` is used to install a `rust` binary?
They're not doing so here, but shipping a wasm-compiled binary with npm that uses node's WASI API is a really easy way to ship a cross-platform CLI utility. Just needs ~20 lines of JS wrapping it to set up the args and file system.
Doesn’t this seem excessive over just using rust’s cross platform builds?
There's no such thing as a truly "cross-platform" build. Depending on what you use, you might have to target specific combinations of OS and processor architecture. That's actually why WASM (though they went with WASI) is a better choice; especially for libraries, since anyone can drop it into their environment without worrying about compatibility.
there’s 3 os and 2 architectures minus darwin-amd64 so you just need to do 5 builds to avoid the WASM performance tax.
(freebsd runs linux binaries and the openbsd people probably want to build from source anyways)
Can you link to a sample of how I can do this?
https://axodotdev.github.io/cargo-dist/
I found that strange as well. My guess is that `npm` is just the package manager people are most likely to already have installed and doing it this way makes it easy. They might think asking people to install Cargo is too much effort. Wonder if the pattern of using npm to install non-node tools will keep gaining traction.
It's still weird. Why not just use an effing install.sh script like everybody else? And don't tell me "security". Because after installation you will be running an unknown binary anyway.
Most people aren't going to have npm installed though. Nobody outside of web devs uses it.
A lot of people who are not web devs use it, that's what I see. I even saw some mainframe developers use npx to call some tool on some data dump.
Also, this is a web project anyway. Google Workspace is web based, so while there is a good chance that the users aren't web developers, it's a better chance that they have npm than anything else.
In the case that they don't, releases can be downloaded directly too: https://github.com/googleworkspace/cli/releases
If you had to pick one package manager that was most likely installed across all the different user machines in the world, I'd say npm is a pretty good bet.
Pip.
"Most people" are webdevs
Bracing for getting cancelled
Why not just downloadable binary then?
For many, installing something with npm is still easier. It chooses the right binary for your OS/architecture, puts it on your PATH, and streamlines upgrades.
Their Github releases provides the binaries, as well as a `curl ... | sh` install method and a guide to use github releases attestation which I liked.
I feel better with `curl ... | sh` than with npm.
npm suggests projects written in js, which is not something I'm comfortable.
It is nice to see that this is not JS, but Rust.
[delayed]
Hmm, that's right... thanks..
They have them: https://github.com/googleworkspace/cli/releases
NPM as a cross platform package distribution system works really well.
The install script checks the OS and Arch, and pulls the right Rust binary.
Then, they get upgrade mechanism out of the box too, and an uninstall mechanism.
NPM has become the de facto standard for installing any software these days, because it is present on every OS.
To my knowledge NPM isn't shipped in _any_ major OSes. It's available to install on all, just like most package managers, but I'm not sure it's in the default distributions of macOS, Windows, or the major Linux distros?
No package manager is. But of the ones that are installed by users, npm is probably the most popular.
What about pip? It's either installed or immediately available on many OSes
pip might be but it was historically super inconsistent (at least in my experience). Is it `pip install`? `python3 -m pip install`? maybe `pip3 install`? Yeah ubuntu did a lot of damage to pip here. npm always worked because you had to install it and it didnt have a transition phase from python2 being in the OS by default.
`pip install` either doesn’t work out of the box or has the chance to clobber system files though
system pip w/ sudo usually unleashes Zalgo, i’d rather curl | bash but npm is fine too. it’s just about meeting people where they’re at, and in the ai age many devs have npm
if you build for the web, no matter what your backend is (python, go, rust, java, c#), your frontend will almost certainly have some js, so likely you need npm.
This is about eight years old. The python situation has mostly gotten worse since https://xkcd.com/1987/
python packaging / envs is solved now by uv. its not promising or used by people in the know like the last 2 trendy python package managers. i was a big time python hater since it was a pita to support as a devtools guy but now its trivial. uv just works, it won.
I'm not a python dev, but I see a bit of its ecosystem. How does uv compare with conda or venv? I thought JS had the monopoly on competing package managers.
What? It’s much much better now, you can just use uv. Yeah, it’s yet another package manager, but it does it well.
Or go up a rung or two on the abstraction ladder, and use mise to manage all the things (node, npm, python, etc).
> The install script checks the OS and Arch, and pulls the right Rust binary.
That's the arbitrary code execution at install time aspect of npm that developers should be extra wary of in this day and age. Saner node package managers like pnpm ignore the build script and you have to explicitly approve it on a case-by-case basis.
That said, you can execute code with build.rs with cargo too. Cargo is just not a build artifact distribution mechanism.
More of a de facto standard for supply chain attacks tbh
Yeah except you need to install NPM, whereas with a rust binary, which can easily compile cross platform, you don’t.
Honestly I’m shocked to see so many people supporting this
> NPM has become the de facto standard for installing any software these days, because it is present on every OS.
That's not remotely true. If there is a standard (which I wouldn't say there is), it's either docker or curl|bash. Nobody is out there using npm to install packages except web devs, this is absolutely ridiculous on Google's part.
I agree but this isn't a Google project, it's one Google employee.
they offer npm for the large market of cli users who have it, and curl|bash to those who don’t. ¯\_(ツ)_/¯
I think there has been an influx of people vibe coding in Rust because its "fast" but otherwise they have no idea about Rust.
Not because it's fast, but because of its compiler. It acts as a very good guardrail and feedback mechanism for LLMs.
Typescript has surpassed Python and JS as most used on Github for a similar reason
https://xcancel.com/github/status/2029277638934839605?s=20
> making strict typing an advantage, not a chore
It's crazy that people think strict typing is a chore. Says a lot about our society.
I learned TS after a few years with JS. I thought having strict types was cool. Many of my colleagues with much more (JS) experience than me thought it was a hassle. Not sure if they meant the setup or TS or what but I always thought it was weird.
"NPM has become the de facto standard for installing any software these days, because it is present on every OS."
What?!? Must not be in any OS I've ever installed.
Now tar, on the other hand, exists even in windows.
Interesting fact, because cargo builds every tool it downloads from source, you can’t actually run cargo install on Google laptops internally.
I use cargo-binstall, which supports quick install and a couple other methodsfor downloading binaries for rust packages
Why should the package's original language matter?
When I use apt-get, I have no idea what languages the packages were written in.
Because npm is not an os package manager, it's a nodejs package manager
Not everyone has or wants yet another package manager in their system.
I think we all have been working on our own bespoke CLI tools. MCPs are bloated, insecure token hogs. CLIs are easy to write and you can cut it down to only what you need, saving thousands of tokens per prompt. This is another one I'll add to my repertoire of claude CLIs to replace the MCP.
Claude Opus 4.6 couldn't figure out how to use it to write to a Google Sheet (something to do with escaping the !?) and fell back to calling the sheets API directly with gcloud auth.
Can you tell it to write to /tmp/s.py instead of trying to execute it inline?
You should use Gemini obviously </s>
Yeah, that one can't even figure out how to write a formula, or sometimes read data when it's sitting WITHIN context of sheets.
I get better experience if I just copy-paste the sheet data into Gemini web. And IIRC copy-paste is just space "delimited" by default.
> gws doesn't ship a static list of commands. It reads Google's own Discovery Service at runtime and builds its entire command surface dynamically.
What is the practical difference between a "discovery service"+API and an MCP server? Surely humans and LLMs are better off using discovery service"+API in all cases? What would be the benefit of MCP?
Nothing. MCP and HTTP APIs and CLI tools without the good parts. They lack the robustness of the OpenAPI spec, including security standardization, and are more complex to run than simple CLI utilities without any authentication.
I have done it many times, using the swagger.json as a "discovery service" and then having the agent utilize that API. A good OpenAPI spec was working perfectly fine for me all the way back when OpenAI introduced GPTs.
If we standardized on a discovery/ endpoint, or something like that, as a more compact description of the API to reduce token usage compared to consuming the somewhat bloated full OpenAPI spec, you would have everything you need right there.
The MCP side quest for AI has been one of the most annoying things in AI in recent years. Complete waste of time.
Benefit of mcp is that it exists and kinda works, and a lot of tools are available on it. I guess it's all about adoption. But inherently yeah it's a discovery service thingy. Google will never embrace mcp since it's invented by anthropic
I consider it a good first attempt, but indeed hope for a sort of mcp2.0
Right, but surely swagger/openapi has been providing robust API discovery for years? I just don't get what LLMs don't like about it (apart from it possibly using slightly more tokens than MCP)
MCP is like "this is what the API is about, figure it out". You can also change the server side pretty liberally and the agent will figure it out.
Swagger/OpenAPI is "this is EXACTLY what the API does, if you don't do it like this, you will fail". If you change something, things will start falling apart.
in a lot of sphere, MCP is still the hype. And it was the hype in even more sphere some month ago.
Because of FOMO a lot of higher up decided that "we must do a MCP to show that we're also part of the cool kids" and to give an answer to their even-higher-up about "What are you doing regarding IA ?"
The project has been approved, a lot of time has been sunk into the project, so nobody wants to admit that "hmmm actually now it's irrelevant our existing API + a skill.md is enough"
I've seen that in at least 4 companies my friends work in, so I would be surprised if it's not something like that here too.
On the contrary claude code, in my experience, has been perfectly able to use `stripe` `gh` and to construct on the fly a figma cli (once instructed to do it).
theres a lot more to the MCP spec than tool calling, and also people ignoring the fact that remote mcps exist
theres a lot more to the MCP spec than tool calling
Is this basically a CLI version of [1]? If so, I'm glad Google is being forward thinking about how developers actually want to use their apps.
Better this than a Google dashboard, or slopped together third party libs. I know Google says they don't support it, but they'll probably support it better than someone outside of Google can support it.
[1] https://workspaceupdates.googleblog.com/2025/12/workspace-st...
I think it is unrelated.
> gws doesn't ship a static list of commands.
Clever, but frustrating that they don’t bother to provide any docs on the actual commands this supports.
Basically Google’s take on GAM https://github.com/GAM-team/GAM
That’s the first thing I thought as well, although GAM is for admin tools only. I think this is for user APIs? But it’s not really clear…
gog too, which my openclaw agent always stubbornly wants to use instead of delegating to a subagent + custom calendar/imap proxy server I built.
https://github.com/steipete/gogcli
This - gog - yes. But I think it is different than GAM which is an admin tool. reply
Yikes, I wrote that? I hate it when people write cryptic replies like that.
What I meant was 'yes', Google Workspace CLI appears to quite similar to 'gogcli', the CLI written for OpenClaw. Both provide CLI access to a broad range of Google services for both workspace and regular gmail accounts.
GAM, on the other hand, is an admin tool, and strictly for Google Workspace accounts.
It's about time. Reminds me of how even Apple uses Jamf.
Except GAM is already heavily in training data and less likely to be called incorrectly.
The fact that humans can use this feels like a side effect. The developer says it's "built for agents first" and "AI agents would be the primary consumers of every command, every flag, and every byte of output"[1].
[1] https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
I remember reading gog setup instructions, and thinking, "Create oauth app/client? That's bonkers." And as cool+useful as this project looks, it's quite a bit harder to get going, especially if you're not familiar with Google Console and OAuth (or not a dev).
Reading lots of comments about "MCP vs CLI" -- reminds me a bit of the "agent vs. script/app/rpa" debates. It's usually not one or the other, but rather, both.. or the right tool for the job (and that can shift over time).
Biggest complaint we have about MCP is bigger context windows and token spend. Tools do exist that address this. I have just one MCP endpoint with a half dozen tools behind it, including Gmail, Google Calendar, Docs, Github, Notion, and more. Uses tool search tool (ToolIQ) with tiny context footprint. Give it a whirl. https://venn.ai
"This is not an officially supported Google product."
Probably someone's hobby project or 20% time at best.
at least it's actually a google product
They need something like this as it's hard and flaky to automate Google apps with AI. However, step 2 drops me to a fairly technical looking page where I have to configure Google Cloud. If they had a one click installer to automate Google Apps it would be an absolute killer use case for AI for me.
Would be nice if the MCP implemented the Streamable HTTP MCP spec instead of the CLI one. I know this is already a HTTP API, but making it available as an MCP server that clients like Joey[1] can consume easily over network would be nice.
[1] https://github.com/benkaiser/joey-mcp-client
I think this blog post from the author is interesting: https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
The move to CLIs has been really interesting lately. They're easy for agents to use. They're composable with other shell tools. It's going to be interesting to see if mcp sticks around or if everything just moves to service specific CLIs.
Schema Discovery Service is interesting but I have been wondering whether it is finally time to start implementing HATEOAS[0] in REST services.
[0] https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypert...
Interesting, but scary, given that this is not a google product. Who knows whether that breaks any TOS somehow.
This is made by Google Devrel. It's not going to break the TOS, but it could be abandoned. That happens frequently with devrel projects, since they're not actually tasked with or graded on engineering projects.
I think calling DevRel projects 'frequently abandoned' is blunt, but in my experience they are more like samples than production-owned libraries, so you should assume limited maintenance. Before relying on the Google Workspace CLI for automation inspect commit cadence, open issues, last release tag, number of contributors, and whether a product team or SDK maintainer is listed. If you need it in production pin to a release tag and vendor the code with go modules replace or npm shrinkwrap, add a thin adapter so swapping implementations is trivial, and run a couple of integration tests in GitHub Actions to catch regressions. I once had to fork a DevRel CLI that our on-call scripts depended on and maintaining that fork cost a weekend plus a steady trickle of small fixes, so now I put third party CLIs behind a tiny internal wrapper and keep the command surface minimal.
This appears to be published by Google itself
> Disclaimer
> This is not an officially supported Google product.
It’s published by Google, they just don’t provide active “support” for it.
Can you show that the major contributors are employed by Google? I'm not arguing, I'm genuinely asking. Thank you.
https://github.com/jpoehnelt
jpoehnelt/README.md
About
I am a Developer Relations Engineer at Google. Currently I am on the Google Workspace DevRel team and was on the Google Maps Platform before that. Previously I worked at Descartes Labs and the US Geological Survey.
Check out my website at https://justin.poehnelt.com.
Thank you!
You can check their GitHub profile. If they are in https://github.com/googlers, then they are internally verified.
https://github.com/googleworkspace -> in the about links is also verified as part of the Alphabet enterprise - https://github.com/enterprises/alphabet
this is just insane, now my lobster can stably do everything on my behalf vs. you do it all via computer use
How to expose my product suite's API to AI has been a roller coster ride. First it was tool calling hooks, then MCP, then later folks found out AI is better at coding so MCPs suddenly became code-mode, then people realized skills are better at context and eventually now Google has launched cli approach.
Remember this repo is not an agent. It's just a cli tool to operate over gsuite documents that happens to have an MCP command and a bunch of skills prebundled.
That's a new one. I guess the hope is agents are good at navigating cli and it also democratizes the ecosystem to be used by any agent as opposed to Microsoft (which only allows Copilot to work in its ecosystem)
very similar to gogcli(https://github.com/steipete/gogcli), but in RUST
I've been using `gog` but I'd rather have an agent-first thing. I don't want a big bad MCP that occupies all context all the time. I need my claw to be aware on how to edit things. As it so happens, right now `gog` works. But I'm eager to see how this develops.
Me: huh, 0.4.4 version, this project must have been around for a little while.
checks https://github.com/googleworkspace/cli/tags
v0.1.1 2 days ago
v0.2.2 yesterday
v0.3.3 18 hours ago
v0.4.2 9 hours ago
v0.5.0 8 minutes ago
Interesting times we live in..
Totally thought this was built by Google. Great product!
This reminds me a bit of how the GitHub CLI evolved into a foundation for automation and tooling. Do you see the Google Workspace CLI primarily as something for humans using the terminal, or more as a stable interface that automation and AI agents can build on?
It's built for agents first, unfortunately.
https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
I'm just not going to give agents access to my workspace account!
but may they have their own workspace account while working with you? :)
I agree
Are integration vendors like Pipedream in trouble now that every company is pushing out MCP servers and CLIs to ride the AI craze? After the Twitter and Reddit API troubles of prior years, I can't imagine any company would willingly bring down the walls of their gardens and give easy access to precious user data. I'm waiting for the rug pull
Building www.cliwatch.com, so you can keep an eye on how agent-friendly your CLI is ;) feel free to request a benchmark against your CLI docs. Cheers
Neat. I've been running something very similar to this locally for a few months now. They introduced all their documentation into markdown recently. I still rely on discover API and lenient cloud project permissions, so maybe some gains there. Will compare note later.
Seems weird to require another tool (gcloud) to set it up, but it does look to be tightly integrated with google cloud.
You can skip that setup if you already have the OAuth credentials.
Correct me if I'm wrong but the UX difficulty with the Google API ecosystem isn't resolved. It's the goddamn permissioning and service accounts. Great to have a CLI that every other minute says, "you can't do this" -- the CLI really needed to solve this to check my boxes.
Having the available commands change on you dynamically seems like an anti-pattern, but I suppose an AI can deal with it.
It's funny because all these cloud services are suddenly motivated to have truly useful CLIs.
Why cli instead of just HTTP API doc? The agent can use curl or write code to send requests.
They already have a HTTP API, but the real reason is that CLIs are emerging as the most ergonomic way for the current wave of AI agents to do stuff. There's a few benefits over APIs:
- No need to worry about transport layer stuff at all, including auth or headers. This is baked in, so saves context.
- They are self describing with --help and then nested --help commands, way better than trying to decipher an OpenAPI spec. You usually don't even need an agent skill, just call the --help and the LLM figures it out.
CLI is probably more reliable. Also, the ergonomics for the person setting up the machine for the AI are better. They can check to see if the command is working without screwing with curl. It's also possible a human might want to use the software / service they're paying for.
Why is it more reliable? The human usage point is fair, but I doubt how long it is still necesary.
Imagine the amount of boilerplate you need around a single HTTP API call, every time.
The CLI has abstracted that into one single reusable, scriptable command
A CLI runs on the client, so they can embed client-side functionality like telemetry or caching.
if nothing else the cli gives very easy access to the HTTP api docs via `gws schema`
i’d rather not waste the context tokens re implementing their cli from scratch, if indeed it does a good job.
Forget the Gemini extension - Gemini CLI sucks. Forget the MCP - MCP is beyond dead. But for codex or claude cli this is a game changer. Next question is how programmatic have they made the sheets interface... because Gemini sucks at sheets.
I can already see all the CTOs are getting excited to plug this in their OpenClaw instances.
Depends how it compares to gog.
Work fine as a gog replacement for my limited use cases of reading and writing Sheets.
Feel like this should be at the top of the README - not the bottom
> Disclaimer
> Caution
> This is not an officially supported Google product.
Yeah, I was surprised by that as well.
Great, i hope this becomes a trend now that agent skills want clis
GCP Next is Apr 22-24. Hope this continues to live afer that.
> This is not an officially supported Google product.
Common for them. Even for projects on the main google GH org more often than not there'll be stuff like "not an official Google product" and "it is just code that happens to be owned by Google" in the README.
Damn, so it won’t have the legendary Google promise of reliability and maintenance.
While I prefer Google's productivity apps to the Microsoft world in this case Google is just catching up to the APIs and tooling that Microsoft has provided for a long time: https://learn.microsoft.com/en-us/powershell/microsoftgraph/...
Yet somehow Microsoft Copilot doesn't know how to use that tooling.
Google really know how to screw up a product experience.
npm install -g @googleworkspace/cli
gws auth setup
{ "error": { "code": 400, "message": "gcloud CLI not found. Install it from https://cloud.google.com/sdk/docs/install", "reason": "validationError" } }
Which takes you to...
https://docs.cloud.google.com/sdk/docs/install-sdk
Where you have to download a tarball, extract it and run a shell script.
I mean how hard is it to just imitate everyone else out there and make it a straight up npm install?
The readme is AI generated, so I am assuming the lack of effort and hand-off to the bots extends to the rest of this repository.
The contributors are a Google DRE, 5 bots / automating services, and a dev in Canada.
You don't need to use gcloud if you already have:
1. A GCP project (needed for OAuth) 2. Enabled APIs in said project
gcloud cli will probably also require you to make a Google Cloud project and stuff by clicking around their godforsaken webui. hopefully they streamlined that, it took me a long time to figure out when i wanted to write some JS in my spreadsheet
Seems to be built by using Claud code?
Honestly, easier with MCP straight up: https://gmail.mintmcp.com/ https://gcal.mintmcp.com/ https://gdocs.mintmcp.com/ https://gsheets.mintmcp.com/
(all pass through)
https://news.ycombinator.com/item?id=47208398
https://news.ycombinator.com/item?id=47157398
IMHO, CLI tools are better more often than not against MCP.
EDIT: and here is similar opinion from author himself: https://news.ycombinator.com/item?id=47252459
Would it help to backup all my mailboxes and be ready to ditch gmail?
I'm surprised that this didn't officially exist before.
Would be useful if it can atleast show google drive storage in folder structure
> quick setup
> requires setting up gcloud cli first, necessitates making a Google Cloud project
cmon google how come even your attempts at good ux start out with bad ux? let me just oauth with my regular google account like every other cli tool out there. gh cli, claude, codex - all are a simple “click ok” in the browser to log in. wtf.
and the slow setup - i need to make my own oauth app & keys??
EDIT: oh yeah and get my oath app verified all so i can use it with my own account
Haha in the world of AI/MCPs, all of a sudden we have a push for companies to properly build out APIs/CLI tools.
I have always said that if we had done for developers what we are doing for agents the whole world would have been a much better place.
Perhaps we will finally emerge from this decades-long dark age of bloated, do-everything GUI development tools being the fashionable happy path.
The AWS cli tool wants to have a talk… hard to find a more bloated mess of strung together scripts held together by tape.
Even if the AI bubble bursts hard, we'll still have all of the better tooling for us actual humans.
But saying "it's for AI" is a corporate life hack for you to get permission to build said better tooling... =)
One of the very few good things from the AI race has been everyone finally publishing more data APIs out in the open, and making their tools usable via CLIs (or extensible APIs).
I feel like the CLI craze started around 2020. That predates this chat GPT.
CharmCLI golang
Nushell rust
Warp. Shell
Were all around 2020 also that is when alt shells started getting popular probably for same reasons they still are.
But on the other hand a (possibly dark) pattern is emerging where companies are asking you to upgrade to a higher plan to access MCP.
I noted something similar a few weeks ago. Companies are finally putting APIs in front of things that should have had APIs for years!
yeah there's way more demand, and at the same time, it's way easier for the company to build and maintain (with the help of AI). Great to see!
Took them this long to realize MCPs are just worse APIs.
About 90% of “make codebase better for LLMs” is just good old fashioned better engineering that is also good for humans.
They aren't doing that though. At least not yet. It's generated from the discovery tool, which amounts to the spec of the existing API. If they want a high powered CLI they need to dig into the servers behind Google Workspace like they have when they've improved the web apps.
For all people have to say about Pete the openclaw guy he's been perhaps one of the most vocal voices about CLIs > MCPs (or maybe his is just the loudest?) and he also built a GSuite CLI that probably inspired this project.
I mean it's great that we get this, hopefully it can continue to be maintained and I'd love to see a push for similar stuff for other products and at other companies.
My agents will follow this repo with great interest
Is Google Workspace some separate thing from well, normal Google?
I mean I have personal gmail,drive, keep, etc. Will it work there?
It should, as long as you have access to Google cloud for Auth.
https://workspace.google.com/
Google Workspace is their corporate offering (think Microsoft suite competitor)
https://github.com/googleworkspace/cli/issues/119
Looks like it is not available for @gmail.com accounts, because of that bug.
750 mb and enough iops to make a small vm instance explode at launch. I wish google would use their own tools like golang to produce 50mb installs.
wow this will gel very well with my current project. Main hurdle i was facing was connecting with individual services via google oauth to get the data.
Good for AI agents.
Hoping Apple will do the same with iCloud.
Don’t hold your breath
Lol
This is a very interesting way of building agent skills. Seems like the imperative way of orchestration/automation is making a comeback.
`npm install ...`
I wonder why they didn't do this on Python or Ruby, them being the superior languages where `==` works, blah, blah ...
This is effectively why people want agents, right? To be able to bypass all the stuff companies "optimize", ie. the ad-filled websites. The inserted extra conversations that now literally get triggered BECAUSE you're chatting to someone else. The ...
In other words: this is the FANGs worst enemy. This is adblock * 1000, as far as consumers are concerned.
For the love of god, please google, give us personal access tokens and not this Oauth madness inside GCP. It is impossible to get that in any big enterprise.
Google will wait for businesses to become very coupled to this and then kill off the CLI just because.
What's the over/under on this being killed by Google within the next year?
AI Agents are becoming first-class citizens for SaaS
Nice, now I can use this alongside claude to auto document my research work.
Also, what I find fascinating is that the repo was initialized 3 days ago so it seems it's still a work in progress.
Archived in 3… 2… 1…
the most annoying thing with Google Workspace is that you need super admin privilege to properly audit the environment programmatically, I believe because of the cloud-identity api.
Very uninteresting post. Why is this the number one post on Hacker News ? Honestly, absolutely disgusting and you can be ashamed, or ashamed for them.
written in Rust lol
It’s npm and a 750mb install. Meaning it won’t even install on micro instances. I wish it were rust or go binary
Yet another way Google might decide you’ve violated their ToS and cut you off from your entire digital life without warning or recourse.
Sounds handy. But use at your own risk.
I've built a few internal tools using the Workspace APIs, and while they are powerful, the rate limits on the Drive API can be brutal if you are doing bulk operations. Does this repository handle automatic backoff and retries, or do we need to wrap it ourselves?