I was under the impression that most supply chain attacks target source code, not binaries, especially for large projects like OpenBSD.
Does StageX audit source code to the same extend that OpenBSD does? If not, then how would you compare the downgrade in security due to less code auditing vs the reassurance of reproducible builds?
Or, how would you compare StageX with Gentoo, in which the entire system is installed from source. Sure, you have to trust your initial installer, but how could I get a StageX system setup without first having access to a computer with some software installed? If we're at the point where we're worried that every Haskell program that has ever been compiled is owned, then I wonder why I should trust any software that might install StageX onto my computer, or the underlying hardware for that matter?
The Haskell compiler creates a slightly different output every time you compile a program[1]. This makes it difficult to ensure that the binary that is free-to-download downloaded is actually malware free. If it were easy to check, then you could rest easy, assuming that someone out there is doing the check for you (and it would be big news if malware was found).
If you're a hardened security person, then the conversations continues, and the term "bootstrap" becomes relevant.
Since you do not trust compiled binaries, then you can compile programs yourself from the source code (where malware would be noticed). However, in order to compile the Haskell compiler, you must have access to a (recent) version of the Haskell compiler. So, version 10 of the compiler was built using version 9, which was built using version 8, etc. "Bootstrapping" refers (basically) to building version 1. Currently, version 1 was built approximately with smart people, duct tape, and magic. There is no way to build version 1, you must simple download it.
So if you have high security requirements, then you might fear that years ago, someone slipped malware into the Haskell compiler version 1 which will "self replicate" itself into every compiler that it builds.
Until a few years ago, this was a bit of a silly concern (most software wasn't reproducible) but with the rise of Nix and Guix, we've gotten a lot closer to reproducible-everything, and so Haskell is the odd-one-out.
[1]
The term is "deterministic builds" or "reproducible builds". Progress is being made to fix this in Haskell.
Unlike Nix and Guix, Stagex goes much further in that it has a 100% mandate on supply chain integrity. It trusts no single maintainer or computer and disallows any binary blobs. It is thus not possible to package any software that cannot be bootstrapped, reproduced, and signed by at least two maintainers.
Haskell and Ada are the only languages not possible for us to support, or any software built with them.
Everything else is just fine though.
I do hope both languages address this though, as it is blocking a lot of important open source software like pandoc or coreboot from being used in security critical environments.
I understood this as a tool to fight bot-net scraping. I imagined that this would add accountability to clients for how many requests they make.
I know that phrasing it like "large company cloudflare wants to increase internet accountability" will make many people uncomfortable. I think caution is good here. However, I also think that the internet has a real accountability problem that deserves attention. I think that the accountability problem is so bad, that some solution is going to end up getting implemented. That might mean that the most pro-freedom approach is to help design the solution, rather than avoiding the conversation.
Bad ideas:
You're getting lots of bot requests, so you start demanding clients login to view your blog. It's anti-user, anti-privacy, very annoying, readership drops, everyone is sad.
Instead, what if your browser included your government id in every request automatically? Anti-user, anti-privacy, no browser would implement it.
This idea:
But ARC is a middle ground. Subsets of the internet band together (in this case, via cloudflare) and strike a compromise with users. Individual users need to register with cloudflare, and then cloudflare gives you a million tokens per month to request websites. Or some scheme like this. I assume that it would be sufficiently pro-social that the IETF and browsers all agree to it and it's transparent & completely privacy-respecting to normal users.
We already sort of have some accountability: it's "proof of bandwidth" and "proof of multiple unique ip addresses", but that's not well tuned. In fact, IP addresses destroy privacy for most people, while doing very little to stop bot-nets.
> Individual users need to register with cloudflare, and then cloudflare gives you a million tokens per month to request websites. Or some scheme like this.
This seems like it would just cause the tokens to become a commodity.
The premise is that you're giving out enough for the usage of the large majority of people, but how many do you give out? If you give out enough for the 95th percentile of usage then 5% of people -- i.e. hundreds of millions of people in the world -- won't have enough for their normal usage. Which is the first problem.
Meanwhile 95% of people would then have more tokens than they need, and the tokens would be scarce, so then they would sell the ones they're not using. Which is the second problem. The people who are the most strapped for cash sell all their tokens for a couple bucks but then get locked out of the internet.
The third problem is that the AI companies would be the ones buying them, and since the large majority of people would have more than they need, they wouldn't be that expensive, and then that wouldn't prevent scraping. Unless you turn the scarcity way up and make the first and second problems really bad.
I think the idea would be that you ask your credit card to convert $10 into 10 untraceable tokens, and then spend them one at a time. You do a handshake dance with the credit card company so you walk away with tokens that only you know, and you have assurance that the tokens are in the same pool as everyone else who asked for untraceable tokens from that credit card company.
Then you can go and spend them freely. The credit card company (and maybe ever third parties?) can verify that the tokens are valid, but they can't associate them with a user. Assuming that the credit card company keeps a log, they can also verify that a token has never been used before.
In some sense, it's a light-weight and anonymous block chain.
1. You convince a server that you deserve to have 100 tokens (probably by presenting some non-anonymous credentials)
2. You handshake with the server and walk away with 100 untraceable tokens
3. At anytime, you can present the server with a token. The server only knows
a. The token is valid
b. The token has not been previously used
Other details (disclaimer, I am not a cryptographer):
- The server has a public + public key for ARC, which is how it knows that it was the one to issue the tokens. It's also how you know that your tokens are in the same pool as everyone else's tokens.
- It seems like there's an option for your 100 tokens to all be 'branded' with some public information. I assume this would be information like "Expires June 2026" or "Token Value: 1 USD", not "User ID 209385"
- The client actually ends up with a key which will generate the 100 tokens in sequence.
- Obviously the number 100 is configurable.
- It seems like there were already schemes to do this, but providing only one token (RFC 9497, RFC 9474) but I'm not sure how popular those were.
I recently noticed an issue where tailscale was placing its iptables rules above my system firewall (`ts-input all -- anywhere anywhere` before `nixos-fw`). The tailscale input chain looked like this:
Chain ts-input (1 references)
target prot opt source destination
ACCEPT all -- pan anywhere
RETURN all -- 100.115.92.0/23 anywhere
DROP all -- 100.64.0.0/10 anywhere
ACCEPT all -- anywhere anywhere
That seemed bad, but I'm not super familiar with iptables, so I ask Claude 4.5 Sonnet through Kagi. It gives me:
No, nixos-fw IS run on most packets. Let me explain the iptables chain flow:
How the Chain Processing Works
In iptables, when a user-defined chain (like ts-input) is called:
- ACCEPT: Accepts the packet within that chain and returns to the calling chain (INPUT), where processing continues with the next rule
- RETURN: Simply returns to the calling chain and continues processing
- DROP: Drops the packet entirely (no further processing anywhere)
Which is very very wrong. I highlighted the similarity between ACCEPT and RETURN in its explanation, not even with that much conviction or forcefulness, and it turned on a dime.
You're absolutely right to question that - I made an error! Let me correct it:
...
Most packets DON'T reach nixos-fw because:
...
I also had another incident where it looked over a backup script, and gave it an ok, but failed to alert me that `gpg -as --encrypt-to <ID>` requires `-e` to actually do the encryption. By that point I was wiser, and had o3-pro look over the script, which caught it right away.
I'm not sure why AI is so completely trash at security. In fairness, the average software dev is also worse at security compared to writing code, and the answer to many stackoverflow questions is "add --insecure --no-check --bypass-tsl", but I'm still a little shocked at how bad AI is.
I would say most technical people are by now aware that this software (LLMs) make stuff up. If someone wasn't sure, and to find the answer asked the LLM in a manner analogous to yours, and just ran with it, then the problem here are the people.
It was intended. Since my original comment, I have learned that the output of `iptables -L` is incomplete when using `iptables-nft`. Specifically, if hide that the rule
ACCEPT all -- anywhere anywhere
is configured to only match on the interface `tailscale0`.
The folks at security@tailscale.com were prompt to set me straight when I reported it, and I greatly appreciate that.
(I know that you understand this, but just highlighting it)
In fairness, the original Bitcoin white paper referenced both (1) distributed compute and (2) the self-defeating nature of a Byzantine attack as the means of protection. It's not as though (2) is just lucky happenstance.
Out of curiosity, why don't/didn't you start a new version of nixpkgs with hardened source? You could forgo the build server, forcing users to build from scratch (at least to start). You could leverage the plentiful, albeit, less secure, packaging code in the nixpkgs to quickly build out your hardened versions.
Effectively, you're building out an audited copy of nixpkgs on the same build engine, but with hardened configs. Write wrappers to validate git signatures when users update, and you got yourself a chain of trust on the source code distribution for your hardened nixpkg.
I'm sure you had reasons, I'm just interested to know your thought process.
I ultimately thought out what would be easier, a decade political fight to make massive changes to nix, or a fork of it written solo to improve auditability and security, or starting over from the top with a design that checks every dream box I wanted from a linux distro.
I had many RFCs that would have followed this rejected one if there was any change tolerance... so my fastest path to prove out my ideas for a distro with decentralized trust was to start one with that explicit goal.
If I wanted to make things maximally auditable and portable to different build engines, a published dead simple spec with multiple competing implementations that most software engineers already know how to write would be ideal. People could review an engine they use, or ensure all existing implementations on any operating system get identical results and are thus trusted that way. If it natively supports a ton of features to make deterministic builds wildly simpler, major bonus.
OCI/Containerfile was a check on all fronts, and some early maintainers and I riffed on design patterns and realized the OCI ecosystem already had specified multi party signing and verification, artifact integrity, smart layer by layer caching etc etc. This fit our dev experience and threat model perfectly and we could just skip implementing the package build and distribution layer and just start writing packages, like that day. None of us needed to learn or invent a new language or ask auditors to do so or fork nix ecosystem to have proper signing support and write a sane spec... that could be years of wheel spinning.
The time saved by choosing an existing widely used and implemented spec meant we were able to put all energy into full source bootstrapping, universal multi party hardware signing on every build, change, review, and reproduction. Just full source bootstrapped linux from scratch in containerfiles with OCI native multi party signing if all parties independently get the same oci hashes from local builds. Oh and we are going LLVM native like Chimera next week. Big sweeping changes like that are easy with our ultralight setup.
I would note that the features we need for deterministic builds in docker, the most popular OCI implementation, only landed a couple of months before we started stagex, and the full source bootstrapping work by the bootstrappable builds team only got a complete bootstrap for the first time a few months before that and Guix shortly after. Tons of reference material.
If stagex had started before 2022 I imagine we might have used a heavily trimmed down nix clone or tried to convince guix to adopt our threat model, which is much further along in supply chain security than nix but scheme would have been a very isolating choice. I think stagex got lucky starting at exactly the right time when two huge pieces of the puzzle were done for us.
> in nixpkgs that would have allowed us to pwn pretty much the entire nix ecosystem and inject malicious code into nixpkg
Isn't that what happens when a build server or source code is compromised? I'm not sure if the existence of this exploit was egregious, but the blast radius seems normal for a build server exploit.
> how the nix project otherwise solves this problem
You can go into `/etc/nix/nix.conf` and remove `trusted-public-keys` so that you don't trust the output of the build servers. Then you just need to audit a particular commit from nixpkgs (and the source code of the packages that you build) and pin your config to that specific commit.
Otherwise, it seems like the solution is to harden the build system and source code control so that you can freely trust the source code without auditing it yourself. I'm not sure what else can be done.
If your threat model is that the 10+ nixpkg contributors are trustworthy but the github repo is untrustworthy, then git signing would make you safe.
Personally, I worry that a carelessly approved merge in nixpkg or an upstream supply attack is a bigger threat then a github repo exploit (as described here), but I imagine that reasonable minds could disagree.
Regardless, I'm very excited to see that nix builds are almost fully reproducible. That seems great! It seems like this could potentially be the foundation on which a very secure distro is built.
You absolutely should never trust a centralized build server. Any security critical software distribution process should have all packages independently built, verified to have identical hashes, and signed by systems controlled by as many different trusted maintainers or third parties as possible.
Then any user can prove the binary they got was built faithfully from source due to those redundant build system signatures. We designed ReprOS for this purpose.
stagex has also been 100% deterministic, full source bootstrapped, and independently reproduced/signed by multiple maintainers since our first release with a small team of 10ish regular contributors, so it can be done.
Does StageX audit source code to the same extend that OpenBSD does? If not, then how would you compare the downgrade in security due to less code auditing vs the reassurance of reproducible builds?
Or, how would you compare StageX with Gentoo, in which the entire system is installed from source. Sure, you have to trust your initial installer, but how could I get a StageX system setup without first having access to a computer with some software installed? If we're at the point where we're worried that every Haskell program that has ever been compiled is owned, then I wonder why I should trust any software that might install StageX onto my computer, or the underlying hardware for that matter?