people understand they're different, but if bcachefs is out, then that leaves btrfs as the only modern in-tree filesystem, but apparently you can't trust it with important data either.
I've been using btrfs on my NAS for years and have not had any problems. I suspect there are a hell of a lot of people like me you will not hear about because people don't generally get as vocal when things just work.
The venn diagram of "people who want a modern copy-on-write filesystem with snapshots to manage large quantities of data" and "people who want a massive pool of fault-tolerant storage" (e.g. building a NAS) has some pretty significant overlap.
The latter is where BTRFS is still hobbled: While the RAID-0, RAID-1, & RAID-10 modes work absolutely fine, the RAID-5 & RAID-6 modes are still broken, with an explicit warning during mkfs time (and in the manpages) that the feature is still experimental and should not be used to hold data that you care about retaining. This has, and continues to, bite people, with terabytes of data loss (backups are important, people!). That then sours them on every other aspect of ever using BTRFS again.
> If you ignore explicit warnings at mkfs time and then get upset the warning was accurate, you can't really fully blame the file system for it.
Oh, no doubt. I agree.
> Just raid on a lower layer and btrfs on top.
That has its own set of problems. The conventional RAID solution on Linux (MD) also has some pretty terrifying corruption edge cases with RAID-5 and RAID-6 (as I explained in [1]) which will bite you if you're not aware of them and how to work around them.
A robust filesystem purpose-built for the task can only really be found in ZFS.
Won't silent corruption on the raid level be detected by the integrity checks in btrfs? It won't be able to automatically repair it, but it should give errors at least, right?
Yeah, that would be the "error detection at a higher level" (than MD) part. It'd still be on you to pull one drive at a time from the array until the errors go away (then you know which drive has the corrupted block in that stripe, and can remove the mdadm metadata from it and then re-add it to the array so that the kernel forces a clean resync, reconstructing the good block from the parity). Doing the "repair" action in MD would instead rewrite your good parity for now-corrupted data and you would have no means of recovering. MD can't know whether the data is bad or the parity is bad because it doesn't know what the data is supposed to look like; even if btrfs does have a checksum for it, that's on a higher, disconnected layer. All filesystems on top of a parity MD suffer from this same vulnerability; some of them won't even be able to tell you when a file has become corrupted (e.g. FAT32), leading to this corruption being persisted into backups.
If it were only one data block in one stripe I'd be confident re-adding the same drive (and have done so); this is overwhelmingly likely to be a transient error (e.g. bit rot on the drive or a RAM bit flip while writing; either in the drive itself or the machine's main memory) that won't recur.
The MD "check" action can confirm this (it will iterate every stripe and report all parity/data mismatches, so if it only reports one ...) and some distributions ship a cronjob that automatically does this on a monthly basis.
If it were a corrupt parity block in a stripe (i.e. a filesystem with strong error detection reports no errors but the MD check action still reports a data/parity mismatch), this is usually more indicative of a lost write during a re-write operation (e.g. the machine was powered off in the middle of updating the contents of a stripe), as the parity is written last -- i.e. the parity would be for the old data in that stripe, not the data as it is now.
The MD "repair" action (if you are ABSOLUTELYCERTAIN that it is the parity that is bad) will automatically correct this problem, which you should do, as the failure of a disk containing a data block within that stripe will then leave you with incorrectly calculated data that will then start showing up as filesystem errors (if you're fortunate enough to be using such a filesystem).
Of course all of the usual caveats about checking SMART statistics apply in determining whether a drive is still suitable for continued use. If the same drive kept showing up with the same problems, I'd retire it; if the drive starts reporting an increase in reallocated sector count, I'd retire it; and so on.
A lot of open source volunteers can't really be replaced because there is no one willing to volunteer to maintain that thing. This is complicated by the fact that people mostly get credit for creating new projects and no credit for maintenance. Anyone who could take over bcachefs would probably be better off creating their own new filesystem.
Whether or not you agree with Kent on this, you have to commend that he tends to be very active in discussing issues with the community in a fairly open, calm, and thought out way (at least from what I've seen).
Comparatively, I find subtweeting him from the sanctity of Mastodon, with a few insults and backhanded complements thrown in for good measure, a bit low.
Ehh. I don't think Kent is an arsehole. The problem with terms like "arsehole" that is that they conflate a bunch of different issues. It doesn't really have much explanatory power. Someone who is difficult to work with can be that way for loads of different reasons: Ego, tunnel vision, stress, neuro divergence (of various kinds), commercial pressures , greed, etc etc.
There is always a point where you have to say "no I can't work with this person any more", but while you are still trying to it's worth trying to figure out why someone is behaving as they do.
> The problem with terms like "arsehole" that is that they conflate a bunch of different issues.
Agree, plus I’d add: if we are going to criticise other people’s communication style/abilities or attitude, then using a vague, vulgar and hurtful slang term like “arsehole”/“asshole” (and similar slang such as “dick”, “prick”, etc) is an example of exhibiting the very thing one is complaining about in making the complaint, which is fundamentally hypocritical. One can state the same concerns in a more professional way, focusing on the details of the specific behaviour pattern not a vague term which can refer to lots of distinct behaviours (e.g. people with ASD traits who hurt the feelings of others because they honestly have trouble thinking about them, versus people with antisocial or narcissistic personality disorder traits who knowingly hurt the feelings of others because they enjoy doing so) - labelling the behaviour pattern not the person, acknowledging that it is entirely possibly due to an unintentional skills gap, (sub)culture clash, differences in life experiences, neurodiversity/neurodivergence/mental health/trauma, etc.
I also think it is helpful when criticising the flaws of others to try to relate them to one’s own, whenever possible - e.g. sometimes in the past I did X and from my perspective it looks like you are doing something similar-hurtful labels are not encouraging that kind of self-reflectiveness at all, they promote the idea that “I’m one of the good ones but you are one of the bad ones”
People who go on holier-than-thou rants like that are usually extremely unpleasant to work with and will cancel you (as directly admitted in that post) if you contradict them on anything.
The person that has down voted this doesn't seem to know that error handling in VB6 works like that. Most functions don't return an error code you can check, neither are exceptions available.