While GitHub obsess over shoving AI into everything, the rest of the platform is genuinely crumbling and its security flaws are being abused to cause massive damage.
Last week Aqua Security was breached and a few repositories it owns were infected. The threat actors abused widespread use of mutable references in GitHub Actions, which the community has been screaming about for years, to infect potentially thousands of CI runs. They also abused an issue GitHub has acknowledged but refused to fix that allows smuggling malicious Action references into workflows that look harmless.
GHA can’t even be called Swiss cheese anymore, it’s so much worse than that. Major overhauls are needed. The best we’ve got is Immutable Releases which are opt in on a per-repository basis.
You can pin actions versions to their hash. Some might say this is a best practice for now. It looks like this, where the comment says where the hash is supposed to point.
Old --> uses: actions/checkout@v4
New --> uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
The problem is actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 probably doesn’t do this same pinning, and the actions ecosystem is such an intertwined mess that any single compromised action can propagate to the rest
This is true specifically for actions/checkout, but composite actions can have other actions as dependencies, and unless the composite action pins the versions of its dependencies, it is vulnerable for this attack.
This article[0] gives a good overview of the challenges, and also has a link to a concrete attack where this was exploited.
I've always been worried about their backend changing and somehow named tags with a previous commit hash working for an attacker to give something you didn't expect for the commit hash.
See also pinact[1], gha-update[2], and zizmor's unpinned-uses[3].
The main desiderata with these kinds of action pinning tools is that they (1) leave a tag comment, (2) leave that comment in a format that Dependabot and/or Renovate understands for bumping purposes, and (3) actually put the full tag in the comment, rather than the cutesy short tag that GitHub encourages people to make mutable (v4.x.y instead of v4).
Checkout v4 of course, released in August 2025, which already now pollutes my CI status with garbage warnings about some Node version being deprecated I could absolutely care less about. I swear half the problems of GitHub are because half that organization has some braindead obsession with upgrading everything everywhere all the time, delivering such great early slop experiments as "dependabot".
I worry that CI just got overcomplicated by default when providers started rocking up with templated YAML and various abstractions over it to add dynamic behaviour, dependencies, and so on.
Perhaps mixing the CI with the CD made that worse because usually deployment and delivery has complexities of its own. Back in the day you'd probably use Jenkins for the delivery piece, and the E2E nightlies, and use something more lightweight for running your tests and linters.
For that part I feel like all you need, really, is to be able to run a suite of well structured shell scripts. Maybe if you're in git you follow its hooks convention to execute scripts in a directory named after the repo event or something. Forget about creating reusable 'actions' which depend on running untrusted code.
Provide some baked in utilities to help with reporting status, caching, saving junit files and what have you.
The only thing that remains is setting up a base image with all your tooling in it. Docker does that, and is probably the only bit where you'd have to accept relying on untrusted third parties, unless you can scan them and store your own cached version of it.
I make it sound simpler than it is but for some reason we accepted distributed YAML-based balls of mud for the system that is critical to deploying our code, that has unsupervised access to almost everything. And people are now hooking AI agents into it.
You could use these shell script versions of pipelines in GHA though, right? There is nothing stopping you from triggering a bash script via a "run" step in YAML.
These reusable actions are nothing but a convenience feature. This discussion isn't much different than any other supply chain, dependency, or packaging system vulnerability such as NPM, etc.
One slight disclaimer here is the ability of someone to run their own updated copy of an action when making a PR. Which could be used to exfil secrets. This one is NOT related to being dependent on unverified actions though.
(re-reading this came across as more harsh than I intended.. my bad on that. But am I missing something or is this the same issue that every open-source user-submitted package repository runs in to?)
I'm trying out SelfCI [1] for one of my projects and it's similar to what you were describing. My whole CI pipeline is just a shell script that runs the actual build and test commands, I can write a script in another language like python if I need more complexity and I can run it all locally at any time to debug.
It really feels like Firefox is not a supported browser on GitHub, I hit this and also find that much of the time the commit message is not correctly pulled from the PR description when that setting is enabled
i had something similar with PRs last year. 2x PRs of mine disappeared for me. they were still counted in the total number of PRs and everyone else could see them.
I don't want to give too much credit to Github, because their uptime is truly horrendous and they need to fix it. But: I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.
> I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
This is definitely true.
At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...
> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
> If GitHub does not meet the SLA, Customer will be entitled to service credit to Customer's account ("Service Credits") based on the calculation below ("Service Credits Calculation").
The linked document in my previous comment has more detail.
It's worth adding that big (BIG!) business clients will usually negotiate the terms for going below the SLA threshold. The goal is less to be compensated if it happens, and more to incentivize the provider to never let it happen.
You're right that labelling any outage as "Github is down" is an overgeneralisation, & we should focus on bottlenecks that impact teams in a time sensitive matter, but that isn't the case here. Their most stable service (API) has only two 9s (99.69%).
They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.
Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.
Most people complaining about uptime aren't free users or open-source developers. It's people whose companies are enterprise GitHub customers. It's a real problem and affects productivity.
Honestly, you're right - 2̶7̵ 87+ (correction from sibling) hours per year is absolutely fine & normal for me & anything I want to run. I personally think it should be fine for everybody.
On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.
This company is part of the portfolio of a $trillion+ transnational corporation. The idea that we can't judge them, when they clearly have more resources than 99% of other companies on this planet, doesn't hold up to any scrutiny.
Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?
The OP clearly never says we can't judge them. He was speaking to how the uptime is measured. I'm not saying I agree or disgree with the OP but at least address the argument he's making.
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.
The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.
See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.
From GitHub CTO in 2025 when they announced they're moving everything to Azure instead of letting GitHub's infrastructure remain independent:
> For us, availability is job #1, and this migration ensures GitHub remains the fast, reliable platform developers depend on
That went about as well as everyone thought back then.
Does anyone else remember back in ~2014-2015 sometime, when half the community was screaming at GitHub to "please be faster at adding more features"? I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability. Seems those days are long gone.
I work on lots of smaller client projects - usually named by the hostname. I absolutely don't understand how at some point the github search got so great it became unable to find my own repo by its name.
We have since switched to self hosted Forgejo instance. Unsurprisingly the search works.
They definitely have. Github evolved a lot faster after the microsoft acquisition, I remember being mildly impressed after it was stagnant for years (this is not an opinion on whether it was evolving in the right direction or if it was a good trade-off)
This was before Actions and a whole lot of other non-git related stuff. There was years (maybe even a decade?) where GitHub essentially was unchanged besides fixes and small incremental improvements, long time ago :)
> The improvements to PR review have been nice though
I dunno, probably the worst UX downgrade so far, almost no PRs are "fully available" on page load, but requires additional clicks and scrolling to "unlock" all the context, kind of sucks.
Used to be you loaded the PR diff and you actually saw the full diff, except really large files. You could do CTRL+F and search for stuff, you didn't need to click to expand even small files. Reviewing medium/large PRs is just borderline obnoxious today on GH.
I find it impossible to use the current diff view for most codebases, and spend tons of time clicking open all available sections...
They have somehow found the worst possible amount of context for doing review. I tend to pull everything down to VS Code if I want to have any confidence these days.
> I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability
That's only a valid sentiment if you only use the big players. Both of those have medium/smaller competitors that have shown (for decades) that they are extremely boring, therefore stable.
Try convincing the CTO that this panoply of smaller players will be around for 5yrs or worth the effort migrating to.
I'm at a much smaller outfit now so we have more freedom but I'd dread to think the arguments I would've had at the 4000+ employee companies I was at before.
In that same period the big players have only gotten bigger and the "Mittelstand" in tech has been practically dying. Replaced by the flood of VC startups that are far too obsessed with "growth" to care about reliability and stability.
(Note that "is this company financially viable in the long term future" is an important part of stability. Doesn't matter how rock solid the software is if the startup's bankrupt by the end of next year.)
That's about when I joined, and all I really remember thinking was that it was cool that I could now share my repo publicly without having to try and run a server from a residential IP.
GitHub is in a tough spot. From what I've heard they've been ordered to move everything to Azure from their long standing dataceners. That is bound to cause issues. Then on top of that they are using AI coders for infra changes (supposedly) which will also add issues.
And then on top of all that, their traffic is probably skyrocketing like mad because of everyone else using AI coders. Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
That can't be easy on their servers.
I do not envy their reliability team (but having been through this myself, if you're reading this GitHub team, feel free to reach out!).
> Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
I think this is a really important point that is getting overlooked in most conversations about GitHub's reliability lately.
GitHub was not designed or architected for a world where millions of AI coding agents can trivially generate huge volumes of commits and PRs. This alone is such a huge spike and change in user behavior that it wouldn't be unreasonable to expect even a very well-architected site to struggle with reliability. For GitHub, N 9s of availability pre-AI simply does not mean the same thing as N 9s of availability post-AI. Those are two completely different levels of difficulty, even when N is the same.
Have anyone checked out the status page? It's actually way worse than I thought, I believe this is the first time I am actually witnessing a status page with truly horrible results.
And notably, that page makes this post's title inaccurate. As of this morning, it says `90.21% uptime`, which is a _single_ 9, not 3 (though that's for the platform as a whole, no individual component appears to achieve three 9s.)
If all I want is actual git, I’m pretty sure I could get much much more than 98.98% uptime. The value of GitHub is actions, issues, PRs. To me, if actions is down GitHub is down
As someone who was impacted by GitHub's git outage in late February, which caused us to cancel a feature release, I am more sensitive to the availability of their git service, than their chatbot.
Degraded Performance is unavailable as far as I am concerned. If Github has "degraded performance" where it takes 5 minutes to load a PR then that is not good.
At that 3rd party side GH is currently noticeable worse then claude ...
Like they are down to one 9 availability and very very close to losing that to (90.2x%).
This also fit more closely to my personal experience, then the 99.900-99.989 range the article indicates...
Through honestly 99.9% means 8.76h downtime a year, if we say no more then 20min down time per 3 hours (sliding window), and no more then 1h a day, and >50% downtime being (localized) off-working hours (e.g. night, Sat,Sun) then 99.9% is something you can work with. Sure it would sometimes be slightly annoying. But should not cause any real issues.
On the other hand 90.21%... That is 35.73h outage a year. Probably still fine if for each location the working hour availability is 99.95% and the previous constraints are there. But uh, wtf. that just isn't right for a company of that size.
as in your yearly budged of outage is ~8.76h but that budged shouldn't happen all at once and if there is an outage it at most delays works by 20min at a time, and not directly again after you had a downtime
but I did fumble the 90.21% part, which is ~35.73 days i.e. over 857 hours....
This is ... surprisingly honest? The one above is "missing" status page; and most status pages would legally have to be filed in the "fiction" section of the library.
Just to add a little bit of nuance to this not because I'm trying to defend GitHub, they definitely need to up their reliability, but the 90% uptime figure represents every single service that GitHub offers being online 90% of the time. You don't need every single service to be online in order to use GitHub. For example, I don't use Copilot myself and it's seen a 96.47% uptime, the worst of the services which are tracked.
That’s… one 9 of reliability. You could argue the title understates the problem.
> You don't need every single service to be online in order to use GitHub.
Well that’s how they want you to use it, so it’s an epic failure in their intended use story. Another way to put this is ”if you use more GitHub features, your overall reliability goes down significantly and unpredictably”.
Look, I have never been obsessed with nines for most types of services. But the cloud service providers certainly were using it as major selling/bragging points until it got boring and old because of LLMs. Same with security. And GitHub is so upstream that downstream effects can propagate and cascade quite seriously.
On the other hand: it also doesn't include instances where GitHub is painfully slow but technically usable.
These days it is very common that something like opening the diff view of a trivial PR takes 15-30 seconds to load. Sure, it will eventually load after a long wait or an F5, but it is still negatively impacting my productivity.
There have been multiple outages in the past year where they didn’t even fully report it very quickly. I’m talking the types of outages that brings down normal enterprise usage: we hook delivery for CI/CD, git operations for everyone, PRs for code review. And that’s not even including GitHub actions or copilot which lots of people also rely on.
To be honest, I’m not surprised that GitHub has been having issues.
If you have ever operated GitHub Enterprise Server, it’s a nightmare.
It doesn’t support active-active. It only supports passive standbys. Minor version upgrades can’t be done without downtime, and don’t support rollbacks. If you deploy an update, and it has a bug, the only thing you can do is restore from backup leading to data loss.
This is the software they sell to their highest margin customers, and it fails even basic sniff tests of availability.
Data loss for source code is a really big deal.
Downtime for source control is a really big deal.
Anyone that would release such a product with a straight face, clearly doesn’t care deeply about availability.
So, the fact that their managed product is also having constant outages isn’t surprising.
Our security scanning runs on GitHub Actions — every PR gets checked before merge. When GitHub goes down, the security gate goes down with it. PRs pile up, devs get impatient, start merging without waiting for checks. That's exactly when bad code gets through. And they keep throwing engineers at Copilot while the stuff that CI/CD actually depends on keeps falling over.
Windows (including Notepad and Explorer), too. I think ~Office~ ~Office 365~ ~Microsoft 365~ Copilot 365 is still technically useful despite the insane branding and licensing and AI slop features, but I doubt it'll last much longer.
IPv6 ignorance is the canary. There's plenty of architecture ignorance below the surface. The real question is why aren't they failing annual security audits?
the real problem isn't the reliability numbers, it's that GitHub sold itself as an integrated platform, every service you adopt raises the blast radius, teams that treat github like any other external dependency, with fallback runners and artifact mirrors, aren't sweating this
I only use GitHub (and actions) for personal open-source projects, so I can't really complain because I'm getting everything for free¹. But even for those projects I recently had to (partially) switch actions to a paid solution² because GitHub's runners were randomly getting stuck for no discernible reason.
I’m surprised it’s even as high as three nines, at one point in 2025 it was below 90%; not even a single nine.[0] (which, to be fair includes co-pilot, which is the worst of availabilities).
People on lobsters a month ago were congratulating Github on achieving a single nine of uptime.[1]
I make jokes about putting all our eggs in one basket under the guise of “nobody got fired for buying x; but there are sure a lot of unemployed people”- but I think there’s an insidious conversation that always used to erupt:
“Hey, take it easy on them, it’s super hard to do ops at this scale”.
Which lands hard on my ears when the normal argument in favour of centralising everything is that “you can’t hope to run things as good as they do, since there’s economies of scale”.
These two things can’t be true simultaneously.. this is the evidence.
Sure they can. Perhaps a useful example of something like this would be to consider cryptography. Crypto is ridiculously complex and difficult to do correctly. Most individual developers have no hope of producing good cryptographic code on the same scale and dependability of the big crypto libraries and organizations. At the same time these central libraries and organizations have bugs, mistakes and weaknesses that can and do cause big problems for people. None of that changes the fact that for most developers “rolling your own crypto” is a bad idea.
That’s an excellent example. OpenSSL, by virtue of trying to do everything is the most buggy implementation of TLS generally available today leading to the point where there have been hard forks designed to reduce the scope to limit this damage.
I’d go so far as to say that there are more crypto libraries than there are “default” options for SaaS Git VCS (Gitlab and Github are the mainstay in companies and maybe Azure Devops if you hate your staff- nobody sensible is using bitbucket) but for TLS implementations there’s RustTLS, GnuTLS, BoringSSL, LibreSSL, WolfSSL, NSS, and AWS-LC that come to mind immediately.
wait they still have 3 ninth, it really doesn't feel like that
but then their status center isn't really trust-able anymore and
a lot of temporary issues I have been running into seem to be temporary,
partial, localized failures which sometimes fall under temp. slow to a
point of usability. Temporary served outdated (by >30min) main/head. etc.
I have a little bit of sympathy for Github because if everyone is like me then they are getting 5-6x the demand they were last year just based on sheer commits alone, not to mention Github Copilot usage.
The irony no one is talking about: AI makes quality code worse. Was bad enough already so imagine it now. I am expecting many more services to drop from 3 nines to 1 nine.
The availability expectations gap is interesting
from an education standpoint. Students are tought
that 99.9% sounds impressive without contextualizing
what that means in practice — roughly 8 hours of
downtime per year. For a platform that millions
of developers depend on as critical infrastructure
during work hours, that math hits very differently
than it does for a consumer app.
Anyone who used the phrase "measly" in relation to three nines is inadvertently admitting their lack of knowledge in massive systems. 99.9 and 99.95 is the target for some of the most common systems you use all day and is by no means easy to achieve. Even just relying on a couple regional AWS services will put your CEILING at three nines. It's even more embarrassing when people post that one GH uptime tracker that combines many services into 1 single number as if that means anything useful.
Three 9s is a perfectly reasonable bar to expect for services you depend on. Without GitHub my company cannot deploy code. There is no alternative method to patch prod. In addition many development activities are halted, wasting labor costs.
We wouldn’t couple so much if we knew reliability would be this low. It will influence future decisions.
For solo and small team projects, I've started treating GitHub as distribution rather than infrastructure. Git itself is distributed — the repo on my machine is the source of truth. Deploy scripts that can run without GitHub Actions. Local backups of anything critical.
It's a bit more work upfront, but the peace of mind when you see yet another incident on the status page is worth it.
I wonder how much of this is down to the massive amount of new repos and commits (of good or bad quality!) from the coding agents. I believe that the App Store is struggling to keep up with (mostly manual tbf) app reviews now, with sharp increases in review times.
I find it hard to believe that an Azure migration would be that detrimental to performance, especially with no doubt "unlimited credit" to play with?
You can provision Linux machines easily on Azure and... that's all you need? Or is the thinking that without bare metal NVMe mySQL it can't cope (which is a bit of a different problem tbf).
I think part of the issue is that Azure has been struggling to reliably provision Linux VMs. Whether that's due to increased load, poor operational execution, or a combination of them, it's hard for anyone on the outside to know.
I worked on the react team while at GitHub and you could easily tell which pages rendered with react vs which were still using turbo. I wish we took perf more seriously as a culture there
I'm amazed Microslop let us keep GitHub this long. Probably because they're training AI on it? To have a direct line to developers? I don't see why else they would've bothered with something that was so anti everything they stood for
A migration like this is a monumental undertaking to the level of where the only sensible way to do a migration like this is probably to not do it. I fully expect even worse reliability over the next few years before it'll get better.
I'm surprised GitHub got by acting fairly independently inside Microsoft for so long. I'm also surprised GitHub employees expected that to last
The real problem today IMO is that Microsoft waited so long to drop the charade that they now felt like they had to rip the bandaid. From what I've heard the transition hasn't gone very smoothly at all, and they've mostly been given tight deadlines with little to no help from Microsoft counterparts.
why is az devops on the floor? i am having to choose between the clients existing az dops and our internal gitlab for where to host a pipeline, and i don't know what would be good at all
It works fine,it just feels like it has been under a kind of maintenance mode for a while.
There's clearly one small team that works on it. There are pros and cons to that.
It hasn't even got an obnoxious Copilot button yet for example, but on the other hand it was only relatively recently you could properly edit comments in markdown.
If the client has existing AzDo Pipelines then I'd suggest keeping them there.
This was after seeing those ridiculous PRs where microsoft engineers patiently deconstructed AI slop PRs they were forced to deal with on the open source repos they maintained.
When he was gone a few months later and github was folded into microsoft's org chart the writing was firmly on the wall.
He was never truly independent though. The org structure was such that the GitHub CEO reported up through a Microsoft VP and Satya. He was never really a CEO after the acquisition, it was in name only.
Also of note is that the Microsoft org chart always showed GitHub in that structure while the org chart available to GitHub stopped at their CEO. Its not that they were finally rolled into Microsoft's org chart so much as they lifted the veil and stopped pretending.
I never said he was "truly independent" nor meant to imply it.
Nonetheless it looks like he was both willing and able to push back on a good deal of the AI stupidity raining down from above and then he was removed and then, well, this...
I'm somewhat surprised with Github's strategy in the AI times.
I understand how appealing it is to build an AI coding agent and all that, but shouldn't they - above everything else - make sure they remain THE platform for code distribution, collaboration and alike? And it doesnt need to be humans, that can be agents as well.
They should serve the AI agent world first and foremost. Cause if they dont pull that off, and dont pull off building one of the best coding agents - whcih so far they didnt - there isn't much left.
There's so many new features needed in this new world. Really unclear why we hear so little about it, while maintainers smack the alarm bell that they're drowning in slop.
Microsoft’s real goal is selling Copilot seats and pushing Azure, not building a neutral playground for third-party agents. There is just no money for them in being the backend for someone else's AI.
As for the AI spam, GitHub's internal metrics have always been tied to engagement and PR volume. Blocking all that AI slop would instantly drop their growth numbers, so it is easier for them to just pass the cleanup cost onto open-source maintainers.
To me, Github has always seemed well positioned to be a one-stop solution for software development: code, CI/CD, documentation, ticket tracking, project management etc. Could anyone explain where they failed? I keep hearing that Github is terrible
It always starts out good enough, but the reason they pursue horizontal integration is that it ensures that you won't be able to get out even if (when) you eventually want to. You'll be as glued as a fly to flypaper.
That's the reason you hear the complaints: they're from people who no longer want to be using this product but have no choice.
Because Microsoft doesn't need to innovate or even provide good service to keep the flies glued, they do what they've been doing: focus all their resources on making the glue stickier rather than focusing on making people want to stay even if they had an option to leave.
We use GH and are investing more in the platform features.
Codespaces specifically is quite good for agent heavy teams. Launch a full stack runtime for PRs that are agent owned.
> keep hearing that Github is terrible
I do not doubt people are having issues and I'm sure there have been outages and problems, but none that have affected my work for weeks.
GH is many things to many teams and my sense is that some parts of it are currently less stable than others. But the overall package is still quite good and delivers a lot of value, IMO.
There is a bit of an echo chamber effect with GH to some degree.
ITT lots of complaining, not much building. Microsoft does not give a fuck what you think - they only care if the revenue line goes up. And the revenue line keeps going up despite this instability. Want to build the next unicorn? Build a GitHub competitor.
As of recently (workflows worked for months) I even have part of my CI on actions that fails with [0]
2026-02-27T10:11:51.1425380Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2026-02-27T10:11:56.2331271Z ##[error]The operation was canceled.
I had to disable the workflows.
GitHub support response has been
“ We recommend reviewing the specific job step this occurs at to identify any areas where you can lessen parallel operations and CPU/memory consumption at one time.”
That plus other various issues makes me start to think about alternatives, and it would have never occurred to me one year back.
We've jumped ship to self-hosted Jenkins. Woodpecker CI looks cool but Jenkins seemed like a safer bet for us. It's been well worth the effort and it's simplified and sped up our CI massively.
Once we got the email that they were going to charge for self-hosted runners that was the final nail in the coffin for us. They walked it back but we've lost faith entirely in the platform and vision.
I don’t know what’s worse - in 2026 someone genuinely suggesting Jenkins as a viable GHA alternative, or me agreeing with that.
Jenkins has possibly the worst user experience of any piece of software I’ve had to use in the last few years. It’s slow, brittle, somehow both heavyweight and has no features, littered with security vulns due to architecture, is impossible to navigate, has absolutely no standardisation for usage.
Think the world would be a better place if 70-80% uptime were more tolerated. We really don’t need everything available all the time. More time to talk to each other, to think, more “slow time”.
"Agreed on the echo chamber point. For solo indie projects the overhead of GH Actions adds up though — I moved to self-hosted deploys and cut the complexity significantly. Different tradeoffs for teams vs solo."
Ever since Microsoft's acquisition of GitHub 8 years ago, GitHub has completely enshittified and has become so unreliable, that even self-hosting a Git repository or self-hosted actions yourself would have a far better uptime than GitHub.
This sounded crazy in 2020 when I said that in [0]. Now it doesn't in 2026 and many have realized how unreliable GitHub has become.
If there was a prediction market on the next time GitHub would have at least one major outage per week, you would be making a lot of money since it appears that AI chatbots such as Tay.ai, Zoe and Copilot are somewhat in charge of wrecking the platform.
Any other platform wouldn't tolerate such outages.
The 37 minutes of downtime last week cost us a deploy window during market hours. What's underappreciated: it's not just the raw downtime, it's that every CI/CD pipeline, every webhook, every deployment gate has GitHub as a single point of failure now. The centralization risk is real.
While GitHub obsess over shoving AI into everything, the rest of the platform is genuinely crumbling and its security flaws are being abused to cause massive damage. Last week Aqua Security was breached and a few repositories it owns were infected. The threat actors abused widespread use of mutable references in GitHub Actions, which the community has been screaming about for years, to infect potentially thousands of CI runs. They also abused an issue GitHub has acknowledged but refused to fix that allows smuggling malicious Action references into workflows that look harmless.
GHA can’t even be called Swiss cheese anymore, it’s so much worse than that. Major overhauls are needed. The best we’ve got is Immutable Releases which are opt in on a per-repository basis.
Public service announcement
You can pin actions versions to their hash. Some might say this is a best practice for now. It looks like this, where the comment says where the hash is supposed to point.
There is a tool to sweep through your repo and automate this: https://github.com/mheap/pin-github-actionThe problem is actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 probably doesn’t do this same pinning, and the actions ecosystem is such an intertwined mess that any single compromised action can propagate to the rest
Yes, true, but at least the fire won't spread through this one point. Hopefully all of your upstreams can be persuaded to pin also.
Doesn't a single compromised action in the chain cause the whole to be fucked? Pinning the top level doesn't prevent any spread.
Well, it is a git commit hash of the action repo that contains the transpiled/bundled javascript.
Like: https://github.com/actions/checkout/tree/11bd71901bbe5b1630c...
So I'm pretty sure that for the same commit hash, I'll be executing the same content.
This is true specifically for actions/checkout, but composite actions can have other actions as dependencies, and unless the composite action pins the versions of its dependencies, it is vulnerable for this attack.
This article[0] gives a good overview of the challenges, and also has a link to a concrete attack where this was exploited.
[0]: https://nesbitt.io/2025/12/06/github-actions-package-manager...
My preferred tool to solve these issues is called 'gitlab'
CircleCI
TravisCI
Jenkins
scripts dir
Etc
yeah, github's business model is not really a git repository but a bunch of other (admittedly useful) stuff that traps people in their ecosystem.
> There is a tool to sweep through your repo and automate this: [third-party]
Dependabot, too.
This won't pin the action's dependencies, so it's a shallow approach only.
I've always been worried about their backend changing and somehow named tags with a previous commit hash working for an attacker to give something you didn't expect for the commit hash.
See also pinact[1], gha-update[2], and zizmor's unpinned-uses[3].
The main desiderata with these kinds of action pinning tools is that they (1) leave a tag comment, (2) leave that comment in a format that Dependabot and/or Renovate understands for bumping purposes, and (3) actually put the full tag in the comment, rather than the cutesy short tag that GitHub encourages people to make mutable (v4.x.y instead of v4).
[1]: https://github.com/suzuki-shunsuke/pinact
[2]: https://github.com/davidism/gha-update
[3]: https://docs.zizmor.sh/audits/#unpinned-uses
Checkout v4 of course, released in August 2025, which already now pollutes my CI status with garbage warnings about some Node version being deprecated I could absolutely care less about. I swear half the problems of GitHub are because half that organization has some braindead obsession with upgrading everything everywhere all the time, delivering such great early slop experiments as "dependabot".
I worry that CI just got overcomplicated by default when providers started rocking up with templated YAML and various abstractions over it to add dynamic behaviour, dependencies, and so on.
Perhaps mixing the CI with the CD made that worse because usually deployment and delivery has complexities of its own. Back in the day you'd probably use Jenkins for the delivery piece, and the E2E nightlies, and use something more lightweight for running your tests and linters.
For that part I feel like all you need, really, is to be able to run a suite of well structured shell scripts. Maybe if you're in git you follow its hooks convention to execute scripts in a directory named after the repo event or something. Forget about creating reusable 'actions' which depend on running untrusted code.
Provide some baked in utilities to help with reporting status, caching, saving junit files and what have you.
The only thing that remains is setting up a base image with all your tooling in it. Docker does that, and is probably the only bit where you'd have to accept relying on untrusted third parties, unless you can scan them and store your own cached version of it.
I make it sound simpler than it is but for some reason we accepted distributed YAML-based balls of mud for the system that is critical to deploying our code, that has unsupervised access to almost everything. And people are now hooking AI agents into it.
You could use these shell script versions of pipelines in GHA though, right? There is nothing stopping you from triggering a bash script via a "run" step in YAML.
These reusable actions are nothing but a convenience feature. This discussion isn't much different than any other supply chain, dependency, or packaging system vulnerability such as NPM, etc.
One slight disclaimer here is the ability of someone to run their own updated copy of an action when making a PR. Which could be used to exfil secrets. This one is NOT related to being dependent on unverified actions though.
(re-reading this came across as more harsh than I intended.. my bad on that. But am I missing something or is this the same issue that every open-source user-submitted package repository runs in to?)
I'm trying out SelfCI [1] for one of my projects and it's similar to what you were describing. My whole CI pipeline is just a shell script that runs the actual build and test commands, I can write a script in another language like python if I need more complexity and I can run it all locally at any time to debug.
[1] https://app.radicle.xyz/nodes/radicle.dpc.pw/rad%3Az2tDzYbAX...
> GHA can’t even be called Swiss cheese anymore, it’s so much worse than that.
That's a high bar though. Few things are better than Swiss cheese.
Salers (if you can find it)
If you want more ammo for your ranting (no offense meant, I also rant): an issue as massive as https://github.com/orgs/community/discussions/142308 lingering for years should do the trick.
It really feels like Firefox is not a supported browser on GitHub, I hit this and also find that much of the time the commit message is not correctly pulled from the PR description when that setting is enabled
Meanwhile, bitbucket has never given us problems with several of our team using Firefox
i had something similar with PRs last year. 2x PRs of mine disappeared for me. they were still counted in the total number of PRs and everyone else could see them.
0_o
* no ipv6 support
* no ff merge support
* no sha2 commit hash support
I don't want to give too much credit to Github, because their uptime is truly horrendous and they need to fix it. But: I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
I never use Github Copilot; it does go down a lot, if their status page is to be believed; I don't really care when it goes down, because it going down doesn't bring down the rest of Github. I care about Github's uptime ignoring Copilot. Everyone's slice of what they care about is a little different, so the only correct way to speak on Github's uptime is to be precise and probably focus on a lot of the core stuff that tons of people care about and that's been struggling lately: Core git operations, website functionality, api access, actions, etc.
> I've felt like its a little unfair to judge the uptime of company platforms like this; by saying "if any feature at all is down, its all down" and then translating that into 9s for the platform.
This is definitely true.
At the same time, none of the individual services has hit 3x9 uptime in the last 90 days [0], which is their Enterprise SLA [1] ...
> "Uptime" is the percentage of total possible minutes the applicable GitHub service was available in a given calendar quarter. GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
[0]: https://mrshu.github.io/github-statuses/
[1]: https://github.com/customer-terms/github-online-services-sla
(may have edited to add links and stuff, can't remember, one of those days)
So what happens for those enterprise customers now? Is there a meaningful fallout when these services fail to meet their SLAs?
> If GitHub does not meet the SLA, Customer will be entitled to service credit to Customer's account ("Service Credits") based on the calculation below ("Service Credits Calculation").
The linked document in my previous comment has more detail.
It's worth adding that big (BIG!) business clients will usually negotiate the terms for going below the SLA threshold. The goal is less to be compensated if it happens, and more to incentivize the provider to never let it happen.
You're right that labelling any outage as "Github is down" is an overgeneralisation, & we should focus on bottlenecks that impact teams in a time sensitive matter, but that isn't the case here. Their most stable service (API) has only two 9s (99.69%).
They're not even struggling to get their average to three 9s, they're struggling to get ANY service to three 9s. They're struggling to get many services to two 9s.
Copilot may be the least stable at one 9, but the services I would consider most critical (Git & Actions) are also at one 9.
I love multiple 9s as much as the next guy but that's only 27 hours per year of downtime. For a mostly free (for me) service, I'm thankful.
Most people complaining about uptime aren't free users or open-source developers. It's people whose companies are enterprise GitHub customers. It's a real problem and affects productivity.
GitHub going down during office hours in a large enterprise has knock on effects for hours as well. Especially if you are in a monorepo.
I'm happy to report that my one-person sysops has successfully hit nine-fives for the 20th year in a row!
If there's only a 9 in availability, they've got a minimum downtime of 87.6 hours per year (98.99999999999999999%)
Honestly, you're right - 2̶7̵ 87+ (correction from sibling) hours per year is absolutely fine & normal for me & anything I want to run. I personally think it should be fine for everybody.
On the other hand the baseline minimal Github Enterprise plan with no features (no Copilot, GHAS, etc.) runs a medium sized company $1m+ per annum, not including pay-per-use extras like CI minutes. As an individual I'm not the target audience for that invoice, but I can envisage whomever is wanting a couple of 9s to go with it. As a treat.
87 hours a year is 1.5 hours a week. If that 1.5 hour window is when you need to use it it matters a hell of a lot more than if it’s 4am on a Sunday.
Nine nines is too hard; my target is eight eights.
ONLY TWO NINES! Meanwhile vital government services here have a whopping 25% availability.
Two things can be bad.
This company is part of the portfolio of a $trillion+ transnational corporation. The idea that we can't judge them, when they clearly have more resources than 99% of other companies on this planet, doesn't hold up to any scrutiny.
Why defend a company that clearly doesn't care about its customers and see them as a money spigot to suck dry?
The OP clearly never says we can't judge them. He was speaking to how the uptime is measured. I'm not saying I agree or disgree with the OP but at least address the argument he's making.
There's a completely reasonable comment by jamiemallers on this thread which is marked as 'dead' even after vouching. Not sure what's going on there.
Presumably what's going on is https://news.ycombinator.com/item?id=47340079 . It's been quite an issue lately.
Take a look at his comment history.
It doesn't help that almost all of the big tech companies talking about 5 9s are lying about it; "Does it respond to the API at all, even with errors? It's up!" and so on. If you spend a lot of time analyzing browser traces you see errors and failures constantly from everyone, even huge companies that brag a lot about their prowess. But it's "up" even if a shard is completely down.
The five nines tech people usually are talking about is a fiction; the only place where the measure is really real is in networking, specifically service provider networking, otherwise it's often just various ways of cleverly slicing the data to keep the status screen green. A dead giveaway is a gander at the SLAs and all the ways the SLAs are basically worthless for almost everyone in the space.
See also all of the "1 hour response time" SLAs from open source wrapper companies. Yes, in one hour they will create a case and give you case ID. But that's not how they describe it.
From GitHub CTO in 2025 when they announced they're moving everything to Azure instead of letting GitHub's infrastructure remain independent:
> For us, availability is job #1, and this migration ensures GitHub remains the fast, reliable platform developers depend on
That went about as well as everyone thought back then.
Does anyone else remember back in ~2014-2015 sometime, when half the community was screaming at GitHub to "please be faster at adding more features"? I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability. Seems those days are long gone.
GitHub have not really got much better at adding new features either though :(
I don't know, it's nice that they finally broke native browser in-page search. That's a great feature for people who hate finding things.
I work on lots of smaller client projects - usually named by the hostname. I absolutely don't understand how at some point the github search got so great it became unable to find my own repo by its name.
We have since switched to self hosted Forgejo instance. Unsurprisingly the search works.
Makes you actually read the code!
Intended usage is to use Edge Copilot to search the page for you.
They definitely have. Github evolved a lot faster after the microsoft acquisition, I remember being mildly impressed after it was stagnant for years (this is not an opinion on whether it was evolving in the right direction or if it was a good trade-off)
No they were slow at doing features before, and they are still slow afterwards.
This was before Actions and a whole lot of other non-git related stuff. There was years (maybe even a decade?) where GitHub essentially was unchanged besides fixes and small incremental improvements, long time ago :)
GH Actions was good for them as another billable feature, but I'm skeptical we actually gained much over external CI providers
The improvements to PR review have been nice though
> The improvements to PR review have been nice though
I dunno, probably the worst UX downgrade so far, almost no PRs are "fully available" on page load, but requires additional clicks and scrolling to "unlock" all the context, kind of sucks.
Used to be you loaded the PR diff and you actually saw the full diff, except really large files. You could do CTRL+F and search for stuff, you didn't need to click to expand even small files. Reviewing medium/large PRs is just borderline obnoxious today on GH.
I find it impossible to use the current diff view for most codebases, and spend tons of time clicking open all available sections...
They have somehow found the worst possible amount of context for doing review. I tend to pull everything down to VS Code if I want to have any confidence these days.
Don't forget the security implications if you host your own actions runner.
Back in the day when software could be "finished". Ahh, the good 'ol days
They added the service unavailable feature.
> I wish we could get back to platforms (or OSes for that matter) focusing in reliability and stability
That's only a valid sentiment if you only use the big players. Both of those have medium/smaller competitors that have shown (for decades) that they are extremely boring, therefore stable.
Try convincing the CTO that this panoply of smaller players will be around for 5yrs or worth the effort migrating to.
I'm at a much smaller outfit now so we have more freedom but I'd dread to think the arguments I would've had at the 4000+ employee companies I was at before.
In that same period the big players have only gotten bigger and the "Mittelstand" in tech has been practically dying. Replaced by the flood of VC startups that are far too obsessed with "growth" to care about reliability and stability.
(Note that "is this company financially viable in the long term future" is an important part of stability. Doesn't matter how rock solid the software is if the startup's bankrupt by the end of next year.)
That's about when I joined, and all I really remember thinking was that it was cool that I could now share my repo publicly without having to try and run a server from a residential IP.
I think stability and reliability have vastly improved over the last years in general (not necessarily talking about gh specifically)
It's just that everybody is using 100 tools and dependencies which themselves depend on 50 others to be working.
Perhaps when they switch over fully to Azure they'll forget to disable IPv6 access. One can dream
GitHub is in a tough spot. From what I've heard they've been ordered to move everything to Azure from their long standing dataceners. That is bound to cause issues. Then on top of that they are using AI coders for infra changes (supposedly) which will also add issues.
And then on top of all that, their traffic is probably skyrocketing like mad because of everyone else using AI coders. Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
That can't be easy on their servers.
I do not envy their reliability team (but having been through this myself, if you're reading this GitHub team, feel free to reach out!).
> Look at popular projects -- a few minutes after an issue is filed they have sometimes 10+ patches submitted. All generating PRs and forks and all the things.
I think this is a really important point that is getting overlooked in most conversations about GitHub's reliability lately.
GitHub was not designed or architected for a world where millions of AI coding agents can trivially generate huge volumes of commits and PRs. This alone is such a huge spike and change in user behavior that it wouldn't be unreasonable to expect even a very well-architected site to struggle with reliability. For GitHub, N 9s of availability pre-AI simply does not mean the same thing as N 9s of availability post-AI. Those are two completely different levels of difficulty, even when N is the same.
Have anyone checked out the status page? It's actually way worse than I thought, I believe this is the first time I am actually witnessing a status page with truly horrible results.
https://mrshu.github.io/github-statuses
And notably, that page makes this post's title inaccurate. As of this morning, it says `90.21% uptime`, which is a _single_ 9, not 3 (though that's for the platform as a whole, no individual component appears to achieve three 9s.)
Note that it gets 90% largely off Copilot going down and Actions not working. Actual git has 98.98%, which is still just one 9 but a lot better.
If all I want is actual git, I’m pretty sure I could get much much more than 98.98% uptime. The value of GitHub is actions, issues, PRs. To me, if actions is down GitHub is down
As someone who was impacted by GitHub's git outage in late February, which caused us to cancel a feature release, I am more sensitive to the availability of their git service, than their chatbot.
> 98.98%
it's the 2 nines they aimed for
True! Technically even 9.99% would be three nines!
Too bad they didn't find an irrational number, could have got infinite nines.
Especially compared to its archive page: https://web.archive.org/web/20190510070456/https://www.githu...
1-4 incidents per month compared to about 1 daily.
It looks this bad because that includes 'degraded performance,' not just outrage.
Degraded Performance is unavailable as far as I am concerned. If Github has "degraded performance" where it takes 5 minutes to load a PR then that is not good.
freudian slip
Well then clearly you haven't taken a look at https://status.claude.com.
At that 3rd party side GH is currently noticeable worse then claude ...
Like they are down to one 9 availability and very very close to losing that to (90.2x%).
This also fit more closely to my personal experience, then the 99.900-99.989 range the article indicates...
Through honestly 99.9% means 8.76h downtime a year, if we say no more then 20min down time per 3 hours (sliding window), and no more then 1h a day, and >50% downtime being (localized) off-working hours (e.g. night, Sat,Sun) then 99.9% is something you can work with. Sure it would sometimes be slightly annoying. But should not cause any real issues.
On the other hand 90.21%... That is 35.73h outage a year. Probably still fine if for each location the working hour availability is 99.95% and the previous constraints are there. But uh, wtf. that just isn't right for a company of that size.
> On the other hand 90.21%... That is 35.73h outage a year.
Days, not hours.
You may have fumbled the calculator at one point. 20Min per 3 hours is 88.8% uptime. 99.9% uptime is 11 seconds down.
what I meant is having both at the same time
- at most 20min per 3 hour
- and 99.9% uptime on a yearly basis
as in your yearly budged of outage is ~8.76h but that budged shouldn't happen all at once and if there is an outage it at most delays works by 20min at a time, and not directly again after you had a downtime
but I did fumble the 90.21% part, which is ~35.73 days i.e. over 857 hours....
This is ... surprisingly honest? The one above is "missing" status page; and most status pages would legally have to be filed in the "fiction" section of the library.
I get the email notifications from Anthropic’s status monitor, and I think they might be my most frequent emailer these days.
Just to add a little bit of nuance to this not because I'm trying to defend GitHub, they definitely need to up their reliability, but the 90% uptime figure represents every single service that GitHub offers being online 90% of the time. You don't need every single service to be online in order to use GitHub. For example, I don't use Copilot myself and it's seen a 96.47% uptime, the worst of the services which are tracked.
> Copilot [has] seen a 96.47% uptime
That’s… one 9 of reliability. You could argue the title understates the problem.
> You don't need every single service to be online in order to use GitHub.
Well that’s how they want you to use it, so it’s an epic failure in their intended use story. Another way to put this is ”if you use more GitHub features, your overall reliability goes down significantly and unpredictably”.
Look, I have never been obsessed with nines for most types of services. But the cloud service providers certainly were using it as major selling/bragging points until it got boring and old because of LLMs. Same with security. And GitHub is so upstream that downstream effects can propagate and cascade quite seriously.
On the other hand: it also doesn't include instances where GitHub is painfully slow but technically usable.
These days it is very common that something like opening the diff view of a trivial PR takes 15-30 seconds to load. Sure, it will eventually load after a long wait or an F5, but it is still negatively impacting my productivity.
There have been multiple outages in the past year where they didn’t even fully report it very quickly. I’m talking the types of outages that brings down normal enterprise usage: we hook delivery for CI/CD, git operations for everyone, PRs for code review. And that’s not even including GitHub actions or copilot which lots of people also rely on.
Here is the same thing in 2019: https://web.archive.org/web/20190510070456/https://www.githu...
It seems that the same metric is about a magnitude worse than before.
96% is horrible uptime though
the basic git services are at one nine of availability
more context -- their enterprise SLA is 99.9% (3x9) uptime for individual services
https://github.com/customer-terms/github-online-services-sla
> GitHub commits to maintain at least 99.9% Uptime for the applicable GitHub service.
... and none of the individual services have hit 99.9% uptime in the last 90 days according to this site. 0_o
To be honest, I’m not surprised that GitHub has been having issues.
If you have ever operated GitHub Enterprise Server, it’s a nightmare.
It doesn’t support active-active. It only supports passive standbys. Minor version upgrades can’t be done without downtime, and don’t support rollbacks. If you deploy an update, and it has a bug, the only thing you can do is restore from backup leading to data loss.
This is the software they sell to their highest margin customers, and it fails even basic sniff tests of availability.
Data loss for source code is a really big deal.
Downtime for source control is a really big deal.
Anyone that would release such a product with a straight face, clearly doesn’t care deeply about availability.
So, the fact that their managed product is also having constant outages isn’t surprising.
I think the problem is that they just don’t care.
My $JOB ended up giving up on GHES and migrating to GHEC because of these exact issues.
Our security scanning runs on GitHub Actions — every PR gets checked before merge. When GitHub goes down, the security gate goes down with it. PRs pile up, devs get impatient, start merging without waiting for checks. That's exactly when bad code gets through. And they keep throwing engineers at Copilot while the stuff that CI/CD actually depends on keeps falling over.
Do you have one or more public examples of this?
Nothing unexpected. Microsoft has a remarkable talent for turning good products into useless ones. Skype is another good showcase of such talent.
When will they introduce GitHub for Business?
That has existed for a long time and predates the MS acquisition. It's called Github Enterprise.
My company is on GitHub Enterprise.
Windows (including Notepad and Explorer), too. I think ~Office~ ~Office 365~ ~Microsoft 365~ Copilot 365 is still technically useful despite the insane branding and licensing and AI slop features, but I doubt it'll last much longer.
IPv6 ignorance is the canary. There's plenty of architecture ignorance below the surface. The real question is why aren't they failing annual security audits?
https://docs.github.com/en/enterprise-cloud@latest/organizat...
Coz the audits have same quality as architecture lmfao
the real problem isn't the reliability numbers, it's that GitHub sold itself as an integrated platform, every service you adopt raises the blast radius, teams that treat github like any other external dependency, with fallback runners and artifact mirrors, aren't sweating this
I only use GitHub (and actions) for personal open-source projects, so I can't really complain because I'm getting everything for free¹. But even for those projects I recently had to (partially) switch actions to a paid solution² because GitHub's runners were randomly getting stuck for no discernible reason.
¹ Glossing over the "what they're getting in return" part. ² https://www.warpbuild.com/
I’m surprised it’s even as high as three nines, at one point in 2025 it was below 90%; not even a single nine.[0] (which, to be fair includes co-pilot, which is the worst of availabilities).
People on lobsters a month ago were congratulating Github on achieving a single nine of uptime.[1]
I make jokes about putting all our eggs in one basket under the guise of “nobody got fired for buying x; but there are sure a lot of unemployed people”- but I think there’s an insidious conversation that always used to erupt:
“Hey, take it easy on them, it’s super hard to do ops at this scale”.
Which lands hard on my ears when the normal argument in favour of centralising everything is that “you can’t hope to run things as good as they do, since there’s economies of scale”.
These two things can’t be true simultaneously.. this is the evidence.
[0]: https://mrshu.github.io/github-statuses/
[1]: https://lobste.rs/s/00edzp/missing_github_status_page#c_3cxe...
> These two things can’t be true simultaneously
Sure they can. Perhaps a useful example of something like this would be to consider cryptography. Crypto is ridiculously complex and difficult to do correctly. Most individual developers have no hope of producing good cryptographic code on the same scale and dependability of the big crypto libraries and organizations. At the same time these central libraries and organizations have bugs, mistakes and weaknesses that can and do cause big problems for people. None of that changes the fact that for most developers “rolling your own crypto” is a bad idea.
That’s an excellent example. OpenSSL, by virtue of trying to do everything is the most buggy implementation of TLS generally available today leading to the point where there have been hard forks designed to reduce the scope to limit this damage.
I’d go so far as to say that there are more crypto libraries than there are “default” options for SaaS Git VCS (Gitlab and Github are the mainstay in companies and maybe Azure Devops if you hate your staff- nobody sensible is using bitbucket) but for TLS implementations there’s RustTLS, GnuTLS, BoringSSL, LibreSSL, WolfSSL, NSS, and AWS-LC that come to mind immediately.
https://mrshu.github.io/github-statuses/ Even ignoring Copilot, they seem to be barely at 2 nines of uptime for any service component.
wait they still have 3 ninth, it really doesn't feel like that
but then their status center isn't really trust-able anymore and a lot of temporary issues I have been running into seem to be temporary, partial, localized failures which sometimes fall under temp. slow to a point of usability. Temporary served outdated (by >30min) main/head. etc.
so that won't even show up in this statistics
when GitHub moved to react instead of server rendered pages ie erb/turbolinks/pjax was the start to the end.
the pages got slower, rendering became a nightmare.
then they introduced GitHub actions (half baked) - again very unreliable
then they introduced Copilot - again not very reliable
it's easy to see why availability has gone down the drain.
are they still on the rails monolith ? they speak about it less these days ?
Embrace, extend, extinguish. Except the last one isn't quite going to plan...
Why have five nines when you can have nine fives?
“Microsoft Tentacle” - Now there’s a name for a new product line.
this comment reminded me that GitKraken was a thing. And still is, apparently
I have a little bit of sympathy for Github because if everyone is like me then they are getting 5-6x the demand they were last year just based on sheer commits alone, not to mention Github Copilot usage.
The irony no one is talking about: AI makes quality code worse. Was bad enough already so imagine it now. I am expecting many more services to drop from 3 nines to 1 nine.
The availability expectations gap is interesting from an education standpoint. Students are tought that 99.9% sounds impressive without contextualizing what that means in practice — roughly 8 hours of downtime per year. For a platform that millions of developers depend on as critical infrastructure during work hours, that math hits very differently than it does for a consumer app.
Anyone who used the phrase "measly" in relation to three nines is inadvertently admitting their lack of knowledge in massive systems. 99.9 and 99.95 is the target for some of the most common systems you use all day and is by no means easy to achieve. Even just relying on a couple regional AWS services will put your CEILING at three nines. It's even more embarrassing when people post that one GH uptime tracker that combines many services into 1 single number as if that means anything useful.
Three 9s is a perfectly reasonable bar to expect for services you depend on. Without GitHub my company cannot deploy code. There is no alternative method to patch prod. In addition many development activities are halted, wasting labor costs.
We wouldn’t couple so much if we knew reliability would be this low. It will influence future decisions.
For solo and small team projects, I've started treating GitHub as distribution rather than infrastructure. Git itself is distributed — the repo on my machine is the source of truth. Deploy scripts that can run without GitHub Actions. Local backups of anything critical. It's a bit more work upfront, but the peace of mind when you see yet another incident on the status page is worth it.
I wonder how much of this is down to the massive amount of new repos and commits (of good or bad quality!) from the coding agents. I believe that the App Store is struggling to keep up with (mostly manual tbf) app reviews now, with sharp increases in review times.
I find it hard to believe that an Azure migration would be that detrimental to performance, especially with no doubt "unlimited credit" to play with?
You can provision Linux machines easily on Azure and... that's all you need? Or is the thinking that without bare metal NVMe mySQL it can't cope (which is a bit of a different problem tbf).
This reminds me of when i looked up how many actions runs the openclaw repo triggers. 700k as of now.
I think part of the issue is that Azure has been struggling to reliably provision Linux VMs. Whether that's due to increased load, poor operational execution, or a combination of them, it's hard for anyone on the outside to know.
I worked on the react team while at GitHub and you could easily tell which pages rendered with react vs which were still using turbo. I wish we took perf more seriously as a culture there
Did react render better than turbo or the opposite? I assume a well-optimized turbo page would perform better
React destroyed perf and used more resources than turbo
Was there any discussion to use something other than react?
That's what I figured and has been my experience as well.
Legitimately worse uptime than my self-hosted services. That's pretty funny.
Until paying customers start leaving en masse, they will continue to shovel out subpar service.
I'm amazed Microslop let us keep GitHub this long. Probably because they're training AI on it? To have a direct line to developers? I don't see why else they would've bothered with something that was so anti everything they stood for
What does Microsoft stand for?
Making Incrementally Crappier Repositories, Operating Systems, Office and any Future Technology
you misspelled MicroSlop
GitLab isn't much better right now either unfortunately.
It's time to look for a decentralized Non-Hub alternative.
Github without hub? I don't think that exists.
Email?
More like Git, without the Hub. Perhaps the Hub aspects can be stored in Git as well?
https://radicle.xyz/
https://status.claude.com _anthropic has entered the room_
see also: https://thenewstack.io/github-will-prioritize-migrating-to-a...
A migration like this is a monumental undertaking to the level of where the only sensible way to do a migration like this is probably to not do it. I fully expect even worse reliability over the next few years before it'll get better.
https://news.ycombinator.com/item?id=47315878
Pretty soon, the only 9 they’re going to have is the 9 8s…
I'm surprised GitHub got by acting fairly independently inside Microsoft for so long. I'm also surprised GitHub employees expected that to last
The real problem today IMO is that Microsoft waited so long to drop the charade that they now felt like they had to rip the bandaid. From what I've heard the transition hasn't gone very smoothly at all, and they've mostly been given tight deadlines with little to no help from Microsoft counterparts.
If this were a place for memes, then I'd share that swimming pool meme with Microsoft holding up copilot while GitHub is drowning.
Then Azure Dev Ops (formerly known as Visual Studio Team System) dead o n the ocean floor.
Although given how badly GitHub seems to be doing, perhaps it's better to be ignored.
why is az devops on the floor? i am having to choose between the clients existing az dops and our internal gitlab for where to host a pipeline, and i don't know what would be good at all
It works fine,it just feels like it has been under a kind of maintenance mode for a while.
There's clearly one small team that works on it. There are pros and cons to that.
It hasn't even got an obnoxious Copilot button yet for example, but on the other hand it was only relatively recently you could properly edit comments in markdown.
If the client has existing AzDo Pipelines then I'd suggest keeping them there.
It operated with an independent CEO for a long while.
When I saw his interview: https://thenewstack.io/github-ceo-on-why-well-still-need-hum... i thought "oh, there is some semblance of sanity at Microsoft".
This was after seeing those ridiculous PRs where microsoft engineers patiently deconstructed AI slop PRs they were forced to deal with on the open source repos they maintained.
When he was gone a few months later and github was folded into microsoft's org chart the writing was firmly on the wall.
He was never truly independent though. The org structure was such that the GitHub CEO reported up through a Microsoft VP and Satya. He was never really a CEO after the acquisition, it was in name only.
Also of note is that the Microsoft org chart always showed GitHub in that structure while the org chart available to GitHub stopped at their CEO. Its not that they were finally rolled into Microsoft's org chart so much as they lifted the veil and stopped pretending.
I never said he was "truly independent" nor meant to imply it.
Nonetheless it looks like he was both willing and able to push back on a good deal of the AI stupidity raining down from above and then he was removed and then, well, this...
Maybe they need to improve release strategy with Copilot AI Review =)
I'm somewhat surprised with Github's strategy in the AI times.
I understand how appealing it is to build an AI coding agent and all that, but shouldn't they - above everything else - make sure they remain THE platform for code distribution, collaboration and alike? And it doesnt need to be humans, that can be agents as well.
They should serve the AI agent world first and foremost. Cause if they dont pull that off, and dont pull off building one of the best coding agents - whcih so far they didnt - there isn't much left.
There's so many new features needed in this new world. Really unclear why we hear so little about it, while maintainers smack the alarm bell that they're drowning in slop.
Microsoft’s real goal is selling Copilot seats and pushing Azure, not building a neutral playground for third-party agents. There is just no money for them in being the backend for someone else's AI. As for the AI spam, GitHub's internal metrics have always been tied to engagement and PR volume. Blocking all that AI slop would instantly drop their growth numbers, so it is easier for them to just pass the cleanup cost onto open-source maintainers.
No disagreement here. Just very short sighted.
Cheap, fast, and good. I see which two they chose.
Two?
They didn't even choose two, only one :)
I wonder if they are still running on a single MySQL machine
They are not: https://www.youtube.com/watch?v=_Xl24s_0mZs
The article mentions some concerns related to migrating their MySQL clusters off bare metal.
To me, Github has always seemed well positioned to be a one-stop solution for software development: code, CI/CD, documentation, ticket tracking, project management etc. Could anyone explain where they failed? I keep hearing that Github is terrible
It always starts out good enough, but the reason they pursue horizontal integration is that it ensures that you won't be able to get out even if (when) you eventually want to. You'll be as glued as a fly to flypaper.
That's the reason you hear the complaints: they're from people who no longer want to be using this product but have no choice.
Because Microsoft doesn't need to innovate or even provide good service to keep the flies glued, they do what they've been doing: focus all their resources on making the glue stickier rather than focusing on making people want to stay even if they had an option to leave.
We use GH and are investing more in the platform features.
Codespaces specifically is quite good for agent heavy teams. Launch a full stack runtime for PRs that are agent owned.
I do not doubt people are having issues and I'm sure there have been outages and problems, but none that have affected my work for weeks.GH is many things to many teams and my sense is that some parts of it are currently less stable than others. But the overall package is still quite good and delivers a lot of value, IMO.
There is a bit of an echo chamber effect with GH to some degree.
We use GitHub actions and we have more build failures from actions than we do any other source.
They got acquired by Microsoft.
Feb 10th post OP;
More recently:
Addressing GitHub's recent availability issues
https://github.blog/news-insights/company-news/addressing-gi...
(with a smattering of submissions here the last few weeks but no discussion)
ITT lots of complaining, not much building. Microsoft does not give a fuck what you think - they only care if the revenue line goes up. And the revenue line keeps going up despite this instability. Want to build the next unicorn? Build a GitHub competitor.
As of recently (workflows worked for months) I even have part of my CI on actions that fails with [0]
2026-02-27T10:11:51.1425380Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. 2026-02-27T10:11:56.2331271Z ##[error]The operation was canceled.
I had to disable the workflows.
GitHub support response has been
“ We recommend reviewing the specific job step this occurs at to identify any areas where you can lessen parallel operations and CPU/memory consumption at one time.”
That plus other various issues makes me start to think about alternatives, and it would have never occurred to me one year back.
[0] https://github.com/Barre/ZeroFS/actions/runs/22480743922/job...
We've jumped ship to self-hosted Jenkins. Woodpecker CI looks cool but Jenkins seemed like a safer bet for us. It's been well worth the effort and it's simplified and sped up our CI massively.
Once we got the email that they were going to charge for self-hosted runners that was the final nail in the coffin for us. They walked it back but we've lost faith entirely in the platform and vision.
I don’t know what’s worse - in 2026 someone genuinely suggesting Jenkins as a viable GHA alternative, or me agreeing with that.
Jenkins has possibly the worst user experience of any piece of software I’ve had to use in the last few years. It’s slow, brittle, somehow both heavyweight and has no features, littered with security vulns due to architecture, is impossible to navigate, has absolutely no standardisation for usage.
And yet it’s still more reliable than GHA.
Charging for self-hosted runners is like a corkage fee but you still need to open the bottle yourself.
Just use git, problem solved.
Three nines is more than enough
In the future, no one will need more than one and a half nines.
Think the world would be a better place if 70-80% uptime were more tolerated. We really don’t need everything available all the time. More time to talk to each other, to think, more “slow time”.
Just don’t like the slop that’s getting us there.
"Agreed on the echo chamber point. For solo indie projects the overhead of GH Actions adds up though — I moved to self-hosted deploys and cut the complexity significantly. Different tradeoffs for teams vs solo."
Gitlab > Github you techbro msft bay area betas
Why dont they just vibecode their way into stability? /s
Ever since Microsoft's acquisition of GitHub 8 years ago, GitHub has completely enshittified and has become so unreliable, that even self-hosting a Git repository or self-hosted actions yourself would have a far better uptime than GitHub.
This sounded crazy in 2020 when I said that in [0]. Now it doesn't in 2026 and many have realized how unreliable GitHub has become.
If there was a prediction market on the next time GitHub would have at least one major outage per week, you would be making a lot of money since it appears that AI chatbots such as Tay.ai, Zoe and Copilot are somewhat in charge of wrecking the platform.
Any other platform wouldn't tolerate such outages.
[0] https://news.ycombinator.com/item?id=22867803
Not owned by companies that help the US Federal Government illegally spy on their own citizens and murder children overseas:
Gitlab
Bitbucket
Sourceforge
Forgejo
Codeberg
Radicle
Launchpad
Owned by companies that help the US Federal Government illegally spy on their own citizens and murder children overseas:
Github
Neither Forgejo nor Codeberg are owned by _any_ company. Very important distinction.
Codeberg is nice.
The 37 minutes of downtime last week cost us a deploy window during market hours. What's underappreciated: it's not just the raw downtime, it's that every CI/CD pipeline, every webhook, every deployment gate has GitHub as a single point of failure now. The centralization risk is real.