> The OpenSSH project is careful about not taking on unnecessary dependencies, but Debian was not as careful. That distribution patched sshd to link against libsystemd, which in turn linked against a variety of compression packages, including xz's liblzma. Debian's relaxing of sshd's dependency posture was a key enabler for the attack, as well as the reason its impact was limited to Debian-based systems such as Debian, Ubuntu, and Fedora, avoiding other distributions such as Arch, Gentoo, and NixOS.
Does Fedora use Debian's patch set for sshd, or a similar patch set that adds libsystemd?
Edit: It looks like Fedora wasn't affected because the backdoor triggered a valgrind test failure, so they shipped it with a flag that disabled the functionality that was backdoored. Seems like they lucked out. https://lists.fedoraproject.org/archives/list/devel@lists.fe...
I'm not sure show Fedora is derived from Debian...
If I recall correctly, the backdoor was set up to only activate on rpm and deb based systems, so it wouldn't have been trigged on Arch, Gentoo or NixOS, even if they linked systemd to ssh.
A very well-written piece. The section on funding open source is as relevant as it's ever been, and I don't think we've learnt much since last year.
As the proportion of younger engineers contributing to open-source decreases (a reasonable choice, given the state of the economy), I see only two future possibilities:
1. Big corporations take ownership of key open-source libraries in an effort to continue their development.
2. Said key open-source libraries die, and corporations develop proprietary replacements for their own use. The open source scene remains alive, but with a much smaller influence.
Unfortunately I have no clue how to get a company to put money into the open source we use. Not just my current company, but any company. I've sometimes been able to get permission to contribute something I build on company time, but often what I really want is someone on the project to spend a year or two maintaining it. Do the boring effort of creating a release. Write that complex feature everyone (including me) wants.
In decades past companies you to pay for my license for Visual Studio (I think of a MSDN subscription), clear case, a dozen different issue/work trackers. However as soon as an open source alternative is used I don't know how to get the money that would have been spent to them.
Come to think of it I'm maintainer of a couple open source projects that I don't use anymore and I don't normally bother even looking at the project either. Either someone needs to pay me to continue maintaining it (remember I don't find them useful myself so I'm not doing it to scratch an itch), or someone needs to take them over from me - but given xz attacks I'm no longer sure how to hand maintenance over.
In my prior career I talked to many companies about open source usage. If you tell them they are running an unsupported database or operating system in production, they will often see the value of buying support. But it is much harder to get them to pay for non-production stuff, especially development tools. And even if you find an enlightened manager, getting budget to pay a maintainer for a project is very difficult to even explain.
“We’re paying for contract development? But it’s not one of our products and we’ll have no rights to the software? They’ll fix all the bugs we find, right? Right?” This a hard conversation at most companies, even tech companies.
"They’ll fix all the bugs we find, right?" -- that sounds to me like a reasonable requirement on the maintainer, if they are going to be paid a non negligible amount.
At companies where I've worked, all of the money we've put into open source has been in contracting the developer(s) to add a feature we needed to the upstream version. Of course, this means that we didn't really fund ongoing maintenance on anything we used that had all the features we needed.
Most open source projects don't really need an income stream. It only becomes an issue when the project is large enough that there is desire for someone to work on it half time or more. Smaller projects can still be done as a hobbyist thing. (the project I "maintain" only needs a few hours of my time per year, but since I no longer use it I can't be bothered - there is a problem for those who still use it). Of course it is hard to say - curl seems like it should be a small project but in fact it is large enough to support someone full time.
Sadly, as OS dev, I see third way: development behind closed doors.
With AI and CV reference hunting, number of contributions is higher than ever. Open-source projects are basically spammed, with low quality contributions.
Public page is just a liability. I am considering to close public bugzilla, git repo and discussions. I would just take bug reports and patches from very small circle of customers and power users. Everything except release source tarball, and short changelog would be private!
Open-source means you get a source code, not free customer and dev support!
My company, Distrust, exists to produce, support, and fund our open source security tools.
So far our core full time team of 3 gets to spend about half our time consulting/auditing and half our time contributing to our open projects that most of our clients use and depend on.
The key is for companies to have visibility into the current funding status of the software they depend on, and relationships with maintainers, so they can offer to fund features or fixes they need instead of being blocked.
I think big corporations will take ownership - well not directly but via paying to foundations and it already is the case.
Second thing is there are bunch of things corporations need to use but don't want to develop on their own like SSH.
There is already too much internal tooling inside of big corporations that is rotting there and a lot of times it would be much better if they give it out to a foundation - like Apache foundation where projects go to die or limp through.
From Linux Security Summit 2019, a retrospective on mandatory access control and bounding "damage that can be caused by flawed or malicious
applications" in Android, iOS, macOS, Linux, FreeBSD and Zephyr, https://static.sched.com/hosted_files/lssna19/e5/LSS2019-Ret...
For the past 26 years, the speaker has been engaged in the design, implementation, technology transfer, and application of flexible Mandatory Access Control (MAC). In this talk, he describes the history and lessons learned from this body of work. The background and motivation for MAC is first presented, followed by a discussion of how a flexible MAC architecture was created and matured through a series of research systems. The work to bring this architecture to mainstream systems is then described, along with how the architecture and implementation evolved. The experience with applying this architecture to mobile platforms is examined. The role of MAC in a larger system architecture is reviewed in the context of a secure virtualization system. The state of MAC in mainstream systems is compared before and after our work. Work to bring MAC to emerging operating systems is discussed.
One of my struggles is to get docker to lockdown which images it loads. I'd like to only pull from my own blessed registry and it seems Docker wants to always go back to theirs.
For other "package" managers (eg: CPAN, Debian) I can point to my own archive and be sure everything I manage down stream gets the blessed bits.
I basically have a huge archive/mirror for the supply chain for my perl, PHP, JavaScript, etc.
If anyone has pro tips on how to "lock" docker to one registry that would be cool.
Don't use Docker, use podman (which has a registries.conf for this, with many settings). You can then use podman-docker to have command line Docker compatibility. Podman is more secure than Docker too, by default it runs as a user, rather than as root.
A lovely article, but one section definitely needs a [citation needed]
> (OpenSSL is written in C, so this mistake was incredibly easy to make and miss; in a memory-safe language with proper bounds checking, it would have been nearly impossible.)
package main
import "fmt"
type CmdType int
const (
WriteMsg CmdType = iota
ReadMsg
)
type Cmd struct {
t CmdType
d []byte
l int
}
var buffer [256]byte
var cmds = []Cmd{
Cmd{WriteMsg, []byte("Rain. And a little ice. It's a damn good thing he doesn't know how much I hate his guts."), 88},
Cmd{WriteMsg, []byte("Rain. And a little ice."), 23},
Cmd{ReadMsg, nil, 23},
Cmd{ReadMsg, nil, 88}, // oops!
}
func main() {
for c := range cmds {
if cmds[c].t == WriteMsg {
copy(buffer[:], cmds[c].d[:cmds[c].l])
} else if cmds[c].t == ReadMsg {
fmt.Println(string(buffer[:cmds[c].l]))
}
}
}
The heartbleed problem was that user-controlled input could say how long it was, separate from how long it actually was. OpenSSL then copied the (short) thing into a buffer, but returned the (long) thing, thus revealing all sorts of other data it was keeping in the same buffer.
It wasn't caught because OpenSSL had built its own buffer/memory management routines on top of the actual ones provided by the language (malloc, memcpy, realloc, free), and all sorts of unsafe manipulations were happening inside one big buffer. That buffer could be in a language with perfect memory safety, the same flaw would still be there.
Yes. The crucial issue to me is the increasing frequency of attacks where some piece of open source gets an update - leading to endless hidden supply chain attacks.
I don't see anything that is going to block this from getting worse and worse. It became a pretty common issue that I first heard about with npm or node.js and their variants, maybe because people update software so much there and have lots of dependencies. I don't see a solution. A single program can have huge numbers of dependencies, even c++ or java programs now.
Indeed. In 2020s, if you're not sandboxing each thing, and then sandboxing each library the thing depends on, you're running with way too many opportunities for vulnerability.
I have some ideas about operating system design (and stuff relating to the CPU design, too) to help with this and other issues (e.g. network transparency, resisting fingerprinting, better user programmability and interoperability, etc). This means that it is fully deterministic except I/O, and all I/O uses capabilities which may be proxied etc. Libraries may run in separate processes if desired (but this is not always required). However, other differences compared with existing systems is also necessary for improved security (and other improvements); merely doing other things like existing systems do has some problems. For example, USB will not be used, and Unicode also will not be used. Atomic locking/transactions of multiple objects at once will be necessary, too (this can avoid many kind of race conditions with existing systems, as well as other problems). File access is not done by names (files do not have names). And then, a specific implementation and distribution may have requirements and checking for the packages provided in the package manager and in the default installation (and the specification will include recommendations). These things alone still will not solve everything, but it is a start.
I don't really know because I haven't put work in to investigate, but some things in that direction seem to be, possibly in order of some combination of maturity and comprehensiveness.
I have no freaking idea. Needless to say I don't think our current operating systems are up to the task of actually being secure. You have to be able to somehow dynamic-link in a library whilst only giving calls into that library certain permissions/capabilities... which I don't think even Windows can do.
Forget OS support, is that something that modern CPUs can support efficiently? As far as I can tell, enforcing a security boundary across libraries would require changing the page table twice for every library call, which seems like a big performance hit.
Then maybe your notion of security is useless in the real world and needs a rethink.
Security, when practiced, is a fundamentally practical discipline that needs to work with the world as is, not with dreams of putting people in basements in chains.
Great coverage, however it failed to mention code review and artifact signing as well as full source bootstrapping which are fundamental defenses most distros skip.
In our distro, Stagex, our threat model assumes at least one maintainer, sysadmin, or computer is compromised at all times.
This has resulted in some specific design choices and practices:
- 100% deterministic, hermetic, reproducible
- full source bootstrapped from 180 bytes of human-auditable machine code
- all commits signed by authors
- all reviews signed by reviewers
- all released artifacts are multi-party reproduced and signed
- fully OCI (container) native all the way down "FROM scratch"
- All packages easily hash-locked to give downstream software easy determinism as well
This all goes well beyond the tactics used in Nix and Guix.
As far as we know, Stagex is the only distro designed to strictly distrust maintainers.
It doesn't distrust the developers of the software though, so does not fix the biggest hole. Multiparty reproduction does not fix it either, that only distrusts the build system.
The bigger the project, the higher the chance something slips through, if even an exploitable bug. Maybe it's the developer themselves being compromised, or their maintainer.
Reviews are done on what, you have someone reviewing clang code? Binutils?
The code review problem is something solvable by something like CREV, where the developer community at large publishes the reviews they have done, and eventually there is good coverage of most things.
As the other (dead, but correct) commenter pointed out, job one is proving the released binary artifacts even match source code, as that is the spot that is most opaque to the public where vulns can most easily be injected (and have been in the past over and over and over).
Only with this problem solved, can we prove the code humans ideally start spending a lot more time reviewing (working on it) is actually the code that is shipped in compiled artifacts.
>can most easily be injected (and have been in the past over and over and over).
In practice this is much more rare then a user downloading and running malware or visiting a site that exploits their browser. Compare the number of 0days chrome has had over the years versus the number of times bad actors have hacked Google and replaced download links with links to malware.
Nothing can stop users from being tricked, but normalizing the expectation of signing is our best defense. For instance, we trained users to start to expect the green lock, and started normalizing passkeys and fido2 which prove you are on the correct domain, taking phishing off the table.
Non-web software distribution, particularly for developers, has failed to mature significantly here. Most developers today use brew, nix, alpine, dockerhub, etc. None are signed in a way that allows end users to automatically prove they got artifacts that were faithfully and deterministically built from the expected source code. Could be malware, could be anything. The typical blind trust contract from developers to CDNs that host final compiled artifacts baffles me. Of course you will get malware this way.
Stagex by contrast uses OCI standard signing, meaning you can optionally set a containers/policy.json file in docker or whatever container runtime you use that will cause it to refuse to run any stagex images without reproduction signatures by two or more maintainers.
If you choose to, you can automatically rule out any single developer or system in the stagex chain from injecting malware into your projects.
You can't have a secure sandbox on your workstation without a secure supply chain. Who builds your qemu or Xen binary or enclave image?
Maybe you mean sandboxes like secure enclaves. Almost every solution there builds non-deterministically with unsigned containers any of many maintainers can modify at any time, with minimal chance of detection. Maybe you have super great network monitoring, but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can. Game over.
Qubes has the best sandboxing solution of any workstation OS, but that relies on Fedora which is not fully reproducible, and only signed via centralized single-party-controlled infrastructure. Threaten the right person and you can backdoor qubes and everyone that uses it.
I say this as a qubes user, because it is the least bad workstation sandboxing option we have. We must fix the supply chain to have server or workstation sandboxes we can trust.
By contrast, I help maintain airgapos, repros, and enclaveos which are each special purpose immutable appliance operating systems that function as sandboxes for cold key management, secure software builds, and remotely attestable isolated software respectively. All are built with stagex and deterministic so you should get the same hash from a local build any other maintainer has, proving your artifacts faithfully came from the easily reviewable sources.
>You can't have a secure sandbox on your workstation without a secure supply chain.
Yes, you can as they are independent things.
>Maybe you mean sandboxes like secure enclaves.
No I mean sandbox as in applications are sandboxed from the rest of the system. If you just run an application it shouldn't be able to encrypt all of your files. The OS should protect the rest of the system from potentially badly behaving applications.
>but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can
In practice this is a much rarer kind of an attack. Investing a ton in strengthening the front door is meaningless when the backdoor is completely open. Attackers will attack the weakest link.
>Qubes has the best sandboxing solution of any workstation OS
Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.
>proving your artifacts faithfully came from the easily reviewable sources.
Okay, but as mentioned previously those sources could have vulnerabilities or be malicous. Or users could run other software they have downloaded separately or via a curl | sh.
I sandbox everything in hypervisors, I get it, but you cannot trust a sandbox some internet rando built for you is actually sandboxing. You have to full source bootstrap your sandbox to be guaranteed that the compromise of any of hundreds of dev machines in the usual supply chains did not backdoor your hypervisor.
You need both.
> Attackers will attack the weakest link.
Agreed, and today that is supply chain attacks. I have done them myself in the wild, multiple times. Often as easy as buying an expired email domain of an awal maintainer and doing a password reset for github, dockerhub, godaddy, etc until you control a package in piles of supply chains. Or in the case of most Linux distros just go submit a couple bugfixes and apply to be a maintainer and you have official god access to push any code to major Linux distro supply chains with little to no oversight.
Cheap and effective attacks.
> Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.
You are expected to run a distinct kernel and VM for each security context. The linux kernel is pretty shit at isolating trusted code from untrusted code on its own. Hypervisors are the only reliable sandbox we have so spin up tiny VMs for every workload.
> Okay, but as mentioned previously those sources could have vulnerabilities or be malicous.
Yes of course, and we need a community wide push to review all this code (working on it) but most of the time supply chain attacks are not even in the repos where someone might notice. They are introduced covertly in the release process of the source code tarballs, or in the final artifact generation flows, or in the CDNs that host those final artifacts. Then people review code, and assume that code is what generated final artifacts.
> Or users could run other software they have downloaded separately or via a curl | sh
Some users will always shoot themselves in the foot if they are uneducated on security, so that is a separate education problem. Supply chain attacks however will hit even users doing everything right, and often burn thousands of people at once. Those of us that maintain and distribute software are obligated to give users safe methods to prove software artifacts are faithfully generated from publicly accountable source code, teach them to not to trust any maintainers including us.
Education is the biggest problem on all sides here. For my part, every "curl | sh" I have ever encouraged users to run in the wild is a troll to teach users to never run those.
> Reviews are done on what, you have someone reviewing clang code? Binutils?
There aren't random developers pushing commits to these codebases: these are used by virtually every Linux distro out there (OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name).
It seems obvious to me that GP is talking about protection against rogue distro maintainers, not fundamental packages being backdoored.
You're basically saying: "GP's work is pointless because Linus could insert a backdoor in the Linux kernel".
In addition to that determinism and 100% reproducibility brings another gigantic benefit: should a backdoor ever be found in clang or one of the binutils tool, it's going to be 100% reproducible. And that is a big thing: being able to reproduce a backdoor is a godsend for security.
That’s normally what this means, yes, with a few more intermediate steps. There’s only one bootstrap chain like this that I know of[1,2,3], maintained by Jeremiah Orians and the Guix project; judging from the reference to 180 bytes, that’s what the distro GP describes is using as well.
> This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.
100% reproducible? That's amazing. I'll be honest, I don't really believe you (which I suppose is the point, right?).
Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.
What are you all using to verify commits? Are you guys verifying signatures against a public PKI?
Super interested as I manage the reproducibility program for a large software company.
Several hours later your "out" directory will contain locally built OCI images for every package in the tree, and the index.json for each should contain the exact same digests we commit in the "digests" folder, and the same ones multiple maintainers sign in the OCI standard "signatures" folder.
We build with only a light make wrapper around docker today, though it assumes you have it configured to use the containerd image store backend, which allows for getting deterministic local digests without uploading to a registry.
No reason you cannot build with podman or kaniko etc with some tweaks (which we hope to support officially)
> Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.
We try to keep our package definitions to "FROM scratch" in "linux from scratch" style with no magic to be self documenting to be easy to audit or reference. By all means crib any of our tactics. We use no global env, so each package has only the determinism tweaks needed (if any). We heavily referenced Alpine, Arch, Mirage, Guix, Nix, and Debian to arrive at our current patterns.
> What are you all using to verify commits? Are you guys verifying signatures against a public PKI?
We all sign commits, reviews, and releases with well published PGP keys maintained in smartcards, with expected public keys in the MAINTAINERS file. Most of us have keyoxide profiles as well making it easy to prove all our online presences agree with the expected fingerprints for us.
> Super interested as I manage the reproducibility program for a large software company.
By all means drop in our matrix room, #stagex:matrix.org . Not many people working on these problems. The more we can all collaborate to unblock each other the better!
Very suspicious article. Sounds like the "nothing to see here folks, move along" school of security.
Reproducibility is more like a security smell; a symptom you’re doing things right. Determinism is the correct target and subtly different.
The focus on supply chain is a distraction, a variant of The “trusting trust” attack Ken Thompson described in 1984 is still among the most elegant and devastating. Infected development toolchains can spread horizontally to “secure” builds.
Just because it’s open doesn’t mean anyone’s been watching closely. "50 years of security"? Important pillars of OSS have been touched by thousands of contributors with varying levels of oversight. Many commits predate strong code-signing or provenance tracking. If a compiler was compromised at any point, everything it compiled—including future versions of itself—could carry that compromise forward invisibly. This includes even "cleanroom" rebuilds.
The amount of software depended on is always going to be massive, its not like every developer is going to write a BIOS, kernel, drivers, networking stack, compilers, interpreters, and so on for every project. So there will always be a massive iceberg of other people's code underneath what each developer writes.
Sure, but all of those you mentioned are part of a base OS.
I'm not sure what the fallacy is called, but you say we have an excess of X and then the fallacy is "we can't live without X".
Modern projects especially in the javascript realm have like 10K dependencies. Having one dependency in an Operating System(even though it may itself have their own dependencies) is a huuuuuuuuuge difference.
You can pay cash money to Windows or Red Hat and have either a company that owns all of the deps, or a company that vets all of the dependencies, distributes some cash through donations, and provides a sensible base package.
It may sound extreme, but you don't need much more than a Base OS. If you reaaallly want something else, you can check the OS official package repository. Downloading some third party code is what's extreme to me.
No you do not. If you have not actually validated each and every source package your trust is only related to the generated binaries corresponding to the sources you had. The trusting trust attack was deployed against the source code of the compiler, poisoning specific binaries.
Do you know if GCC 6.99 or 7.0 doesn't put a backdoor in some specific condition?
There's no static or dynamic analysis deployed to enhance this level of trust.
The initial attempts are simulated execution like in valgrind, all the sanitizer work, perhaps difference on the functional level beyond the text of the source code where it's too easy to smuggle things through...
(Like on an abstracted conditional graph.)
We cannot even compare binaries or executables right given differing compiler revisions.
So for example, Google uses a goobuntu/bazel based toolchain to get their go compiler binaries.
The full source bootstrapped go compiler binaries in stagex exactly match the hashes of the ones Google releases, giving us as much confidence as we can get in the source->binary chain, which until very recently had no solution at all.
Go has unique compiler design choices that make it very self contained that make this possible, though we also can deterministically build rust, or any other language from any OCI compatible toolchain.
You are talking about one layer down from that, the source code itself, which is our next goal as well.
Our plan is this:
1. Be able to prove all released artifacts came from hash locked source code (done)
2. Develop a universal normalized identifier for all source code regardless of origin (treehash of all source regardless of git, tar file etc, ignoring/removing generated files, docs, examples, or anything not needed to build) (in progress)
3. Build distributed code review system to coordinate the work to multiple signed reviews by reputable security researchers for every source package by its universal identifier (planning stages)
We are the first distro to reach step 1, and have a reasonably clear path to steps 2 and 3.
We feel step 2 would be a big leap forward on its own, as it would have fully eliminated the xz attack where the attack hid in the tar archive, but not the actual git tree.
Pointing out these classes of problem is easy. I know, did it for years. Actually dramatically removing attack surface is a lot more rewarding.
Besides full source boostrapping which could adopt progressive verification of hardware features and assumption of untrusted hardware, integration of Formal Verification into the lowest levels of boostrapping is a must. Bootstap security with the compiler.
This won't protect against more complex attacks like RoP or unverified state. For that we need to implement simple artifacts that are verifiable and mapped. Return to more simple return states (pass/error). Do error handling external to the compiled binaries. Automate state mapping and combine with targeted fuzzing. Systemd is a perfect example of this kind of thing, what not to do: internal logs and error states being handled by a web of interdependent systems.
RoP and unverified state would at least be highlighted by such an analysis. Generally it's a lot of work and we cannot quite trust fully automated systems to keyword it to us... Especially when some optimizer changes between versions of the compiler. Even a single compile flag can throw the abstract language upside down, much less the execution graph...
Fuzzing is good but probabilistic. It is unlikely to hit on a deliberate backdoor. Solid for finding bugs though.
Full source bootstrapping meaning you build with 100% human auditable source code or machine code. The only path to do this today I am aware of is via hex0 building up to Mes and tinycc on up to a modern c compiler: https://github.com/fosslinux/live-bootstrap/blob/master/part...
As far as I know Gentoo, even from their "stage0" still assumes you bring your own bootstrap compiler toolchain, and thus is not self bootstrapping.
The fosslinux/live-bootstrap project is more about bootstrapping from minimal binary seed than auditability, for the latter case I'd argue that having a readable C cross-compiler is clearer than going through multiple steps involving several programming or scripting languages.
To be able to do this, you must already have both the source for the compiler and what someone has told you is a binary compiled from it. But what if that someone was lying?
Not a programmer, are you? Programmers can fully investigate the compiled binary without anyone even has a chance to lie to them. If a team don't have the ability to audit the decompilation of a 10k LOC C compiler at least once, I doubt their chance against a backdoor hidden in the 100s of steps of https://github.com/fosslinux/live-bootstrap/blob/master/part...
Not everyone that programs is versed in decompiling, digital forensics, reverse engineering, etc.
Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code followed by a lot of manual review labor repeated by every user that wishes to distrust the maintainers.
Okay, but a decompiler could be backdoored as easily as a compiler to hide malicious code vs inject it .
How do you get a decompiler you trust more than the compiler you are reviewing? Do you decompile the decompiler with itself? Back at the trusting trust problem.
Decompilers are way more complex than anything in the hex0->tinycc bootstrap path.
> Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code
No, it is to fully audit the binary of a compiler itself, if you don't trust a decompiler, learn to read machine code, the output from a simple C compiler tend to pretty predictable.
> manual review labor repeated by every user that wishes to distrust the maintainers.
Yes? What's wrong with that? Anyone wishes to distrust, you give them the tools and knowledge to verify the process, the more people able to do this the better.
It is going to be a heroic shared win of the entire community if we get people to even do basic review of dependencies in languages where we have the actual source code. Trying to get people to ignore the source code and actually decompile and review every binary they use on every computer they use, including the decompiler somehow, is a lost cause.
We should expect only a few people will review code, if it is drive-by easy to do. That means proving the binaries for sure came from the published commented formatted code, and then go review that code.
As another commenter observed, having to trust a decompiler doesn't reduce the amount of trust you need to provide, it increases it. Reducing the amount of trust is our high-level goal, remember?
But let's not focus too hard on the logic side of your argument. The part that really convinced everyone that you're right was your opening statement, "Not a programmer, are you?". From that moment it was clear that you were taking the discussion to a higher plane, far above boring everyday logic.
Like a superhero, really. At least, that's how I picture you.
That was the response to your "what someone has told you is a binary" argument, if you learnt the basics of programming, you will know it's just a hexdump away to verify a binary, there's no one else in the room to tell you anything, you hit compile and see the result yourself, it's simple, direct and intimate. Yeah you could say it feels like a superpower, and it's a skill everyone can learn.
So you just dump hex and know exactly what a program does, and can quickly understand if uses good entropy sources, uses good cryptography choices, etc in the same amount of time or less time than you could read the published source code to verify the same?
If you can do that, you are the only one alive that can.
> Infected development toolchains can spread horizontally to “secure” builds.
Nowadays there are so many microcontrolers in your PC an hardware vendor could simply infect: your SSD, HDD, Motherboard or part of the processor. Good luck bootstrapping from hand rolled NAND.
> The OpenSSH project is careful about not taking on unnecessary dependencies, but Debian was not as careful. That distribution patched sshd to link against libsystemd, which in turn linked against a variety of compression packages, including xz's liblzma. Debian's relaxing of sshd's dependency posture was a key enabler for the attack, as well as the reason its impact was limited to Debian-based systems such as Debian, Ubuntu, and Fedora, avoiding other distributions such as Arch, Gentoo, and NixOS.
Does Fedora use Debian's patch set for sshd, or a similar patch set that adds libsystemd?
Edit: It looks like Fedora wasn't affected because the backdoor triggered a valgrind test failure, so they shipped it with a flag that disabled the functionality that was backdoored. Seems like they lucked out. https://lists.fedoraproject.org/archives/list/devel@lists.fe...
I'm not sure show Fedora is derived from Debian...
If I recall correctly, the backdoor was set up to only activate on rpm and deb based systems, so it wouldn't have been trigged on Arch, Gentoo or NixOS, even if they linked systemd to ssh.
A very well-written piece. The section on funding open source is as relevant as it's ever been, and I don't think we've learnt much since last year.
As the proportion of younger engineers contributing to open-source decreases (a reasonable choice, given the state of the economy), I see only two future possibilities:
1. Big corporations take ownership of key open-source libraries in an effort to continue their development.
2. Said key open-source libraries die, and corporations develop proprietary replacements for their own use. The open source scene remains alive, but with a much smaller influence.
Unfortunately I have no clue how to get a company to put money into the open source we use. Not just my current company, but any company. I've sometimes been able to get permission to contribute something I build on company time, but often what I really want is someone on the project to spend a year or two maintaining it. Do the boring effort of creating a release. Write that complex feature everyone (including me) wants.
In decades past companies you to pay for my license for Visual Studio (I think of a MSDN subscription), clear case, a dozen different issue/work trackers. However as soon as an open source alternative is used I don't know how to get the money that would have been spent to them.
Come to think of it I'm maintainer of a couple open source projects that I don't use anymore and I don't normally bother even looking at the project either. Either someone needs to pay me to continue maintaining it (remember I don't find them useful myself so I'm not doing it to scratch an itch), or someone needs to take them over from me - but given xz attacks I'm no longer sure how to hand maintenance over.
In my prior career I talked to many companies about open source usage. If you tell them they are running an unsupported database or operating system in production, they will often see the value of buying support. But it is much harder to get them to pay for non-production stuff, especially development tools. And even if you find an enlightened manager, getting budget to pay a maintainer for a project is very difficult to even explain.
“We’re paying for contract development? But it’s not one of our products and we’ll have no rights to the software? They’ll fix all the bugs we find, right? Right?” This a hard conversation at most companies, even tech companies.
"They’ll fix all the bugs we find, right?" -- that sounds to me like a reasonable requirement on the maintainer, if they are going to be paid a non negligible amount.
Development tools was almost always a tough standalone business even before open source became so prevalent.
At companies where I've worked, all of the money we've put into open source has been in contracting the developer(s) to add a feature we needed to the upstream version. Of course, this means that we didn't really fund ongoing maintenance on anything we used that had all the features we needed.
As an independent maintainer I don't really know where to start trying to organise an ongoing income stream from users to support maintenance.
I thought that the idea of a funding manifest to advertise funding requests was a good idea: https://floss.fund/funding-manifest/ No idea if it works.
Most open source projects don't really need an income stream. It only becomes an issue when the project is large enough that there is desire for someone to work on it half time or more. Smaller projects can still be done as a hobbyist thing. (the project I "maintain" only needs a few hours of my time per year, but since I no longer use it I can't be bothered - there is a problem for those who still use it). Of course it is hard to say - curl seems like it should be a small project but in fact it is large enough to support someone full time.
Sadly, as OS dev, I see third way: development behind closed doors.
With AI and CV reference hunting, number of contributions is higher than ever. Open-source projects are basically spammed, with low quality contributions.
Public page is just a liability. I am considering to close public bugzilla, git repo and discussions. I would just take bug reports and patches from very small circle of customers and power users. Everything except release source tarball, and short changelog would be private!
Open-source means you get a source code, not free customer and dev support!
The FOSSjobs wiki has a bunch of resources on this topic:
https://github.com/fossjobs/fossjobs/wiki/resources
My company, Distrust, exists to produce, support, and fund our open source security tools.
So far our core full time team of 3 gets to spend about half our time consulting/auditing and half our time contributing to our open projects that most of our clients use and depend on.
The key is for companies to have visibility into the current funding status of the software they depend on, and relationships with maintainers, so they can offer to fund features or fixes they need instead of being blocked.
https://distrust.co
I think big corporations will take ownership - well not directly but via paying to foundations and it already is the case.
Second thing is there are bunch of things corporations need to use but don't want to develop on their own like SSH.
There is already too much internal tooling inside of big corporations that is rotting there and a lot of times it would be much better if they give it out to a foundation - like Apache foundation where projects go to die or limp through.
From Linux Security Summit 2019, a retrospective on mandatory access control and bounding "damage that can be caused by flawed or malicious applications" in Android, iOS, macOS, Linux, FreeBSD and Zephyr, https://static.sched.com/hosted_files/lssna19/e5/LSS2019-Ret...
video: https://www.youtube.com/watch?v=AKWFbxbsU3oOne of my struggles is to get docker to lockdown which images it loads. I'd like to only pull from my own blessed registry and it seems Docker wants to always go back to theirs.
For other "package" managers (eg: CPAN, Debian) I can point to my own archive and be sure everything I manage down stream gets the blessed bits.
I basically have a huge archive/mirror for the supply chain for my perl, PHP, JavaScript, etc.
If anyone has pro tips on how to "lock" docker to one registry that would be cool.
Don't use Docker, use podman (which has a registries.conf for this, with many settings). You can then use podman-docker to have command line Docker compatibility. Podman is more secure than Docker too, by default it runs as a user, rather than as root.
Thanks, podman has moved up on my "to eval" list.
A lovely article, but one section definitely needs a [citation needed]
> (OpenSSL is written in C, so this mistake was incredibly easy to make and miss; in a memory-safe language with proper bounds checking, it would have been nearly impossible.)
The heartbleed problem was that user-controlled input could say how long it was, separate from how long it actually was. OpenSSL then copied the (short) thing into a buffer, but returned the (long) thing, thus revealing all sorts of other data it was keeping in the same buffer.It wasn't caught because OpenSSL had built its own buffer/memory management routines on top of the actual ones provided by the language (malloc, memcpy, realloc, free), and all sorts of unsafe manipulations were happening inside one big buffer. That buffer could be in a language with perfect memory safety, the same flaw would still be there.
Related; https://news.ycombinator.com/item?id=43617352 North Korean IT workers have infiltrated the Fortune 500
Good article for what it covers, but sadly does not cover isolation/sandboxing/least privilege.
Yes. The crucial issue to me is the increasing frequency of attacks where some piece of open source gets an update - leading to endless hidden supply chain attacks.
I don't see anything that is going to block this from getting worse and worse. It became a pretty common issue that I first heard about with npm or node.js and their variants, maybe because people update software so much there and have lots of dependencies. I don't see a solution. A single program can have huge numbers of dependencies, even c++ or java programs now.
It's not new, here's one from 6 years ago on c++ - https://www.trendmicro.com/en_us/research/19/d/analyzing-c-c...
Don't forget log4j - https://www.infoworld.com/article/3850718/developers-apply-t..., points to this recent paper https://arxiv.org/pdf/2503.12192
Indeed. In 2020s, if you're not sandboxing each thing, and then sandboxing each library the thing depends on, you're running with way too many opportunities for vulnerability.
Well said! How?
I have some ideas about operating system design (and stuff relating to the CPU design, too) to help with this and other issues (e.g. network transparency, resisting fingerprinting, better user programmability and interoperability, etc). This means that it is fully deterministic except I/O, and all I/O uses capabilities which may be proxied etc. Libraries may run in separate processes if desired (but this is not always required). However, other differences compared with existing systems is also necessary for improved security (and other improvements); merely doing other things like existing systems do has some problems. For example, USB will not be used, and Unicode also will not be used. Atomic locking/transactions of multiple objects at once will be necessary, too (this can avoid many kind of race conditions with existing systems, as well as other problems). File access is not done by names (files do not have names). And then, a specific implementation and distribution may have requirements and checking for the packages provided in the package manager and in the default installation (and the specification will include recommendations). These things alone still will not solve everything, but it is a start.
I don't really know because I haven't put work in to investigate, but some things in that direction seem to be, possibly in order of some combination of maturity and comprehensiveness.
I haven't really understood how lavamoat works (if it works).
I have no freaking idea. Needless to say I don't think our current operating systems are up to the task of actually being secure. You have to be able to somehow dynamic-link in a library whilst only giving calls into that library certain permissions/capabilities... which I don't think even Windows can do.
Forget OS support, is that something that modern CPUs can support efficiently? As far as I can tell, enforcing a security boundary across libraries would require changing the page table twice for every library call, which seems like a big performance hit.
Then maybe your notion of security is useless in the real world and needs a rethink.
Security, when practiced, is a fundamentally practical discipline that needs to work with the world as is, not with dreams of putting people in basements in chains.
Ignorant reply here, but would openbsd's `pledge` and `unveil` sorta cover what you're talking about?
At the library level? Not as far as I know…
Didn’t Jess Frazelle have most of her dependencies running inside lots of Docker containers for a while? She went pretty far and also kept it up for a long time. E.g., https://blog.jessfraz.com/post/docker-containers-on-the-desk...
How would that protect you from a library?
Great coverage, however it failed to mention code review and artifact signing as well as full source bootstrapping which are fundamental defenses most distros skip.
In our distro, Stagex, our threat model assumes at least one maintainer, sysadmin, or computer is compromised at all times.
This has resulted in some specific design choices and practices:
- 100% deterministic, hermetic, reproducible
- full source bootstrapped from 180 bytes of human-auditable machine code
- all commits signed by authors
- all reviews signed by reviewers
- all released artifacts are multi-party reproduced and signed
- fully OCI (container) native all the way down "FROM scratch"
- All packages easily hash-locked to give downstream software easy determinism as well
This all goes well beyond the tactics used in Nix and Guix.
As far as we know, Stagex is the only distro designed to strictly distrust maintainers.
https://stagex.tools
Good step.
It doesn't distrust the developers of the software though, so does not fix the biggest hole. Multiparty reproduction does not fix it either, that only distrusts the build system.
The bigger the project, the higher the chance something slips through, if even an exploitable bug. Maybe it's the developer themselves being compromised, or their maintainer.
Reviews are done on what, you have someone reviewing clang code? Binutils?
The code review problem is something solvable by something like CREV, where the developer community at large publishes the reviews they have done, and eventually there is good coverage of most things.
https://github.com/crev-dev/
As the other (dead, but correct) commenter pointed out, job one is proving the released binary artifacts even match source code, as that is the spot that is most opaque to the public where vulns can most easily be injected (and have been in the past over and over and over).
Only with this problem solved, can we prove the code humans ideally start spending a lot more time reviewing (working on it) is actually the code that is shipped in compiled artifacts.
>can most easily be injected (and have been in the past over and over and over).
In practice this is much more rare then a user downloading and running malware or visiting a site that exploits their browser. Compare the number of 0days chrome has had over the years versus the number of times bad actors have hacked Google and replaced download links with links to malware.
Nothing can stop users from being tricked, but normalizing the expectation of signing is our best defense. For instance, we trained users to start to expect the green lock, and started normalizing passkeys and fido2 which prove you are on the correct domain, taking phishing off the table.
Non-web software distribution, particularly for developers, has failed to mature significantly here. Most developers today use brew, nix, alpine, dockerhub, etc. None are signed in a way that allows end users to automatically prove they got artifacts that were faithfully and deterministically built from the expected source code. Could be malware, could be anything. The typical blind trust contract from developers to CDNs that host final compiled artifacts baffles me. Of course you will get malware this way.
Stagex by contrast uses OCI standard signing, meaning you can optionally set a containers/policy.json file in docker or whatever container runtime you use that will cause it to refuse to run any stagex images without reproduction signatures by two or more maintainers.
If you choose to, you can automatically rule out any single developer or system in the stagex chain from injecting malware into your projects.
>Nothing can stop users from being tricked
But an operating system can limit the blast radius. Proper sandboxing is much more important than securing the supply chain.
You can't have a secure sandbox on your workstation without a secure supply chain. Who builds your qemu or Xen binary or enclave image?
Maybe you mean sandboxes like secure enclaves. Almost every solution there builds non-deterministically with unsigned containers any of many maintainers can modify at any time, with minimal chance of detection. Maybe you have super great network monitoring, but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can. Game over.
Qubes has the best sandboxing solution of any workstation OS, but that relies on Fedora which is not fully reproducible, and only signed via centralized single-party-controlled infrastructure. Threaten the right person and you can backdoor qubes and everyone that uses it.
I say this as a qubes user, because it is the least bad workstation sandboxing option we have. We must fix the supply chain to have server or workstation sandboxes we can trust.
By contrast, I help maintain airgapos, repros, and enclaveos which are each special purpose immutable appliance operating systems that function as sandboxes for cold key management, secure software builds, and remotely attestable isolated software respectively. All are built with stagex and deterministic so you should get the same hash from a local build any other maintainer has, proving your artifacts faithfully came from the easily reviewable sources.
>You can't have a secure sandbox on your workstation without a secure supply chain.
Yes, you can as they are independent things.
>Maybe you mean sandboxes like secure enclaves.
No I mean sandbox as in applications are sandboxed from the rest of the system. If you just run an application it shouldn't be able to encrypt all of your files. The OS should protect the rest of the system from potentially badly behaving applications.
>but if I compromise the CI/CD system to compile all binaries with a non-random RNG, then I can undermine any cryptography you use, and can re-create any sessions keys or secrets you can
In practice this is a much rarer kind of an attack. Investing a ton in strengthening the front door is meaningless when the backdoor is completely open. Attackers will attack the weakest link.
>Qubes has the best sandboxing solution of any workstation OS
Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.
>proving your artifacts faithfully came from the easily reviewable sources.
Okay, but as mentioned previously those sources could have vulnerabilities or be malicous. Or users could run other software they have downloaded separately or via a curl | sh.
> Yes, you can as they are independent things.
I sandbox everything in hypervisors, I get it, but you cannot trust a sandbox some internet rando built for you is actually sandboxing. You have to full source bootstrap your sandbox to be guaranteed that the compromise of any of hundreds of dev machines in the usual supply chains did not backdoor your hypervisor.
You need both.
> Attackers will attack the weakest link.
Agreed, and today that is supply chain attacks. I have done them myself in the wild, multiple times. Often as easy as buying an expired email domain of an awal maintainer and doing a password reset for github, dockerhub, godaddy, etc until you control a package in piles of supply chains. Or in the case of most Linux distros just go submit a couple bugfixes and apply to be a maintainer and you have official god access to push any code to major Linux distro supply chains with little to no oversight.
Cheap and effective attacks.
> Qubes only offers sandboxing between qubes.questions. There isn't sandboxing within a qube.
You are expected to run a distinct kernel and VM for each security context. The linux kernel is pretty shit at isolating trusted code from untrusted code on its own. Hypervisors are the only reliable sandbox we have so spin up tiny VMs for every workload.
> Okay, but as mentioned previously those sources could have vulnerabilities or be malicous.
Yes of course, and we need a community wide push to review all this code (working on it) but most of the time supply chain attacks are not even in the repos where someone might notice. They are introduced covertly in the release process of the source code tarballs, or in the final artifact generation flows, or in the CDNs that host those final artifacts. Then people review code, and assume that code is what generated final artifacts.
> Or users could run other software they have downloaded separately or via a curl | sh
Some users will always shoot themselves in the foot if they are uneducated on security, so that is a separate education problem. Supply chain attacks however will hit even users doing everything right, and often burn thousands of people at once. Those of us that maintain and distribute software are obligated to give users safe methods to prove software artifacts are faithfully generated from publicly accountable source code, teach them to not to trust any maintainers including us.
Education is the biggest problem on all sides here. For my part, every "curl | sh" I have ever encouraged users to run in the wild is a troll to teach users to never run those.
> Reviews are done on what, you have someone reviewing clang code? Binutils?
There aren't random developers pushing commits to these codebases: these are used by virtually every Linux distro out there (OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name).
It seems obvious to me that GP is talking about protection against rogue distro maintainers, not fundamental packages being backdoored.
You're basically saying: "GP's work is pointless because Linus could insert a backdoor in the Linux kernel".
In addition to that determinism and 100% reproducibility brings another gigantic benefit: should a backdoor ever be found in clang or one of the binutils tool, it's going to be 100% reproducible. And that is a big thing: being able to reproduce a backdoor is a godsend for security.
> OK, maybe not the Kubernetes one that ships only 12 binaries, forgot its name
You are likely thinking of Talos Linux, which incidentally also builds itself with stagex.
>full source bootstrapped from 180 bytes of human-auditable machine code
What does this mean? You have a C-like compiler in 180 bytes of assembler that can compile a C compiler that can then compile GCC?
That’s normally what this means, yes, with a few more intermediate steps. There’s only one bootstrap chain like this that I know of[1,2,3], maintained by Jeremiah Orians and the Guix project; judging from the reference to 180 bytes, that’s what the distro GP describes is using as well.
> This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.
[1] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...
[2] https://savannah.nongnu.org/projects/stage0/
[3] https://github.com/oriansj/bootstrap-seeds
That's pretty awesome
Yep, Guix and stagex are the only two distros that full source bootstrap to my knowleldge.
We use an abbreviated and explicit stage0 chain here for easy auditing: https://codeberg.org/stagex/stagex/src/branch/main/packages/...
IIRC the FreeDesktop flatpak runtimes are also built from the Bootstrappable Builds folks full source bootstrap.
As per their landing page, yes.
> stage0: < 190 byte x86 assembly seed is reproduced on multiple distros
> stage1: seed builds up to a tiny c compiler, and ultimately x86 gcc
> stage2: x86 gcc bootstraps target architecture cross toolchains
very impressive, I want to try this out now.
The LWN article is a good place to start:
https://lwn.net/Articles/985739/
100% reproducible? That's amazing. I'll be honest, I don't really believe you (which I suppose is the point, right?).
Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.
What are you all using to verify commits? Are you guys verifying signatures against a public PKI?
Super interested as I manage the reproducibility program for a large software company.
Indeed you do not have to believe me.
> git clone https://codeberg.org/stagex/stagex
> cd stagex
> make
Several hours later your "out" directory will contain locally built OCI images for every package in the tree, and the index.json for each should contain the exact same digests we commit in the "digests" folder, and the same ones multiple maintainers sign in the OCI standard "signatures" folder.
We build with only a light make wrapper around docker today, though it assumes you have it configured to use the containerd image store backend, which allows for getting deterministic local digests without uploading to a registry.
No reason you cannot build with podman or kaniko etc with some tweaks (which we hope to support officially)
> Do you all document how you got around system level sources of non-determinism? Filesystems, metadata, timestamps, tempfiles, etc? This would be a great thing to document for people aiming for the same thing.
We try to keep our package definitions to "FROM scratch" in "linux from scratch" style with no magic to be self documenting to be easy to audit or reference. By all means crib any of our tactics. We use no global env, so each package has only the determinism tweaks needed (if any). We heavily referenced Alpine, Arch, Mirage, Guix, Nix, and Debian to arrive at our current patterns.
> What are you all using to verify commits? Are you guys verifying signatures against a public PKI?
We all sign commits, reviews, and releases with well published PGP keys maintained in smartcards, with expected public keys in the MAINTAINERS file. Most of us have keyoxide profiles as well making it easy to prove all our online presences agree with the expected fingerprints for us.
> Super interested as I manage the reproducibility program for a large software company.
By all means drop in our matrix room, #stagex:matrix.org . Not many people working on these problems. The more we can all collaborate to unblock each other the better!
Read through these websites and LWN articles:
https://reproducible-builds.org/ https://bootstrappable.org/ https://bootstrapping.miraheze.org/ https://lwn.net/Articles/983340/ https://lwn.net/Articles/985739/
Very suspicious article. Sounds like the "nothing to see here folks, move along" school of security.
Reproducibility is more like a security smell; a symptom you’re doing things right. Determinism is the correct target and subtly different.
The focus on supply chain is a distraction, a variant of The “trusting trust” attack Ken Thompson described in 1984 is still among the most elegant and devastating. Infected development toolchains can spread horizontally to “secure” builds.
Just because it’s open doesn’t mean anyone’s been watching closely. "50 years of security"? Important pillars of OSS have been touched by thousands of contributors with varying levels of oversight. Many commits predate strong code-signing or provenance tracking. If a compiler was compromised at any point, everything it compiled—including future versions of itself—could carry that compromise forward invisibly. This includes even "cleanroom" rebuilds.
I agree that it's handwavy, my take on supply chain vulns is that the only way to fight them is to reduce dependencies, massively.
Additionally the few dependencies you have should be well compensated to avoid 'alternative monetization'.
You can't have the cake (massive amounts of gratis software) and eat it too (security and quality warranties).
The 100 layers of signing and layer 4 package managers is a huge coping mechanism by those that are not ready to bite the tradeoff.
The amount of software depended on is always going to be massive, its not like every developer is going to write a BIOS, kernel, drivers, networking stack, compilers, interpreters, and so on for every project. So there will always be a massive iceberg of other people's code underneath what each developer writes.
Sure, but all of those you mentioned are part of a base OS.
I'm not sure what the fallacy is called, but you say we have an excess of X and then the fallacy is "we can't live without X".
Modern projects especially in the javascript realm have like 10K dependencies. Having one dependency in an Operating System(even though it may itself have their own dependencies) is a huuuuuuuuuge difference.
You can pay cash money to Windows or Red Hat and have either a company that owns all of the deps, or a company that vets all of the dependencies, distributes some cash through donations, and provides a sensible base package.
It may sound extreme, but you don't need much more than a Base OS. If you reaaallly want something else, you can check the OS official package repository. Downloading some third party code is what's extreme to me.
The best defense we have against the Trusting Trust attack is full source bootstrapping, now done by two distros: Guix and Stagex.
No you do not. If you have not actually validated each and every source package your trust is only related to the generated binaries corresponding to the sources you had. The trusting trust attack was deployed against the source code of the compiler, poisoning specific binaries. Do you know if GCC 6.99 or 7.0 doesn't put a backdoor in some specific condition?
There's no static or dynamic analysis deployed to enhance this level of trust.
The initial attempts are simulated execution like in valgrind, all the sanitizer work, perhaps difference on the functional level beyond the text of the source code where it's too easy to smuggle things through... (Like on an abstracted conditional graph.)
We cannot even compare binaries or executables right given differing compiler revisions.
So for example, Google uses a goobuntu/bazel based toolchain to get their go compiler binaries.
The full source bootstrapped go compiler binaries in stagex exactly match the hashes of the ones Google releases, giving us as much confidence as we can get in the source->binary chain, which until very recently had no solution at all.
Go has unique compiler design choices that make it very self contained that make this possible, though we also can deterministically build rust, or any other language from any OCI compatible toolchain.
You are talking about one layer down from that, the source code itself, which is our next goal as well.
Our plan is this:
1. Be able to prove all released artifacts came from hash locked source code (done)
2. Develop a universal normalized identifier for all source code regardless of origin (treehash of all source regardless of git, tar file etc, ignoring/removing generated files, docs, examples, or anything not needed to build) (in progress)
3. Build distributed code review system to coordinate the work to multiple signed reviews by reputable security researchers for every source package by its universal identifier (planning stages)
We are the first distro to reach step 1, and have a reasonably clear path to steps 2 and 3.
We feel step 2 would be a big leap forward on its own, as it would have fully eliminated the xz attack where the attack hid in the tar archive, but not the actual git tree.
Pointing out these classes of problem is easy. I know, did it for years. Actually dramatically removing attack surface is a lot more rewarding.
Help welcome!
That's a different problem. The threat in Trusting Trust is that the backdoor may not ever appear in public source code.
Besides full source boostrapping which could adopt progressive verification of hardware features and assumption of untrusted hardware, integration of Formal Verification into the lowest levels of boostrapping is a must. Bootstap security with the compiler.
This won't protect against more complex attacks like RoP or unverified state. For that we need to implement simple artifacts that are verifiable and mapped. Return to more simple return states (pass/error). Do error handling external to the compiled binaries. Automate state mapping and combine with targeted fuzzing. Systemd is a perfect example of this kind of thing, what not to do: internal logs and error states being handled by a web of interdependent systems.
RoP and unverified state would at least be highlighted by such an analysis. Generally it's a lot of work and we cannot quite trust fully automated systems to keyword it to us... Especially when some optimizer changes between versions of the compiler. Even a single compile flag can throw the abstract language upside down, much less the execution graph...
Fuzzing is good but probabilistic. It is unlikely to hit on a deliberate backdoor. Solid for finding bugs though.
I agree here. Use automated tools to find low hanging fruit, or mistakes.
There is unfortunately no substitute for a coordinated effort to document review by capable security researchers on our toolchain sources.
Code review systems like CREV are the solution to backdoors being present in public source code.
https://github.com/crev-dev/
Gentoo is a full source boostrapping if you include the build of GRUB2 and create the initramd file as well as the kernel.
Full source bootstrapping meaning you build with 100% human auditable source code or machine code. The only path to do this today I am aware of is via hex0 building up to Mes and tinycc on up to a modern c compiler: https://github.com/fosslinux/live-bootstrap/blob/master/part...
As far as I know Gentoo, even from their "stage0" still assumes you bring your own bootstrap compiler toolchain, and thus is not self bootstrapping.
The fosslinux/live-bootstrap project is more about bootstrapping from minimal binary seed than auditability, for the latter case I'd argue that having a readable C cross-compiler is clearer than going through multiple steps involving several programming or scripting languages.
But how do you build that readable c cross compiler?
Full source bootstrapping is our only way out of the trusting trust problem
You bootstrap the compiler with itself and audit whether the compiler binary is exactly the same semantics as its source.
>Full source bootstrapping is our only way out of the trusting trust problem
No, that is just deferring the trust to all the tools and scripts that fosslinux/live-bootstrap project provides.
> You bootstrap the compiler with itself
To be able to do this, you must already have both the source for the compiler and what someone has told you is a binary compiled from it. But what if that someone was lying?
Not a programmer, are you? Programmers can fully investigate the compiled binary without anyone even has a chance to lie to them. If a team don't have the ability to audit the decompilation of a 10k LOC C compiler at least once, I doubt their chance against a backdoor hidden in the 100s of steps of https://github.com/fosslinux/live-bootstrap/blob/master/part...
Not everyone that programs is versed in decompiling, digital forensics, reverse engineering, etc.
Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code followed by a lot of manual review labor repeated by every user that wishes to distrust the maintainers.
Okay, but a decompiler could be backdoored as easily as a compiler to hide malicious code vs inject it .
How do you get a decompiler you trust more than the compiler you are reviewing? Do you decompile the decompiler with itself? Back at the trusting trust problem.
Decompilers are way more complex than anything in the hex0->tinycc bootstrap path.
> Anyway, so your means of forming trust in a compiler faithfully compiling code, is to trust a decompiler to faithfully generate human readable source code
No, it is to fully audit the binary of a compiler itself, if you don't trust a decompiler, learn to read machine code, the output from a simple C compiler tend to pretty predictable.
> manual review labor repeated by every user that wishes to distrust the maintainers.
Yes? What's wrong with that? Anyone wishes to distrust, you give them the tools and knowledge to verify the process, the more people able to do this the better.
It is going to be a heroic shared win of the entire community if we get people to even do basic review of dependencies in languages where we have the actual source code. Trying to get people to ignore the source code and actually decompile and review every binary they use on every computer they use, including the decompiler somehow, is a lost cause.
We should expect only a few people will review code, if it is drive-by easy to do. That means proving the binaries for sure came from the published commented formatted code, and then go review that code.
As another commenter observed, having to trust a decompiler doesn't reduce the amount of trust you need to provide, it increases it. Reducing the amount of trust is our high-level goal, remember?
But let's not focus too hard on the logic side of your argument. The part that really convinced everyone that you're right was your opening statement, "Not a programmer, are you?". From that moment it was clear that you were taking the discussion to a higher plane, far above boring everyday logic.
Like a superhero, really. At least, that's how I picture you.
That was the response to your "what someone has told you is a binary" argument, if you learnt the basics of programming, you will know it's just a hexdump away to verify a binary, there's no one else in the room to tell you anything, you hit compile and see the result yourself, it's simple, direct and intimate. Yeah you could say it feels like a superpower, and it's a skill everyone can learn.
So you just dump hex and know exactly what a program does, and can quickly understand if uses good entropy sources, uses good cryptography choices, etc in the same amount of time or less time than you could read the published source code to verify the same?
If you can do that, you are the only one alive that can.
> Infected development toolchains can spread horizontally to “secure” builds.
Nowadays there are so many microcontrolers in your PC an hardware vendor could simply infect: your SSD, HDD, Motherboard or part of the processor. Good luck bootstrapping from hand rolled NAND.
[flagged]
Please don't use an LLM to write comments.