Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are a lot of reasons why just making a copy of the files you need to another FS is not sufficient as a backup, clearly this is one of those. We need more checks to ensure integrity and robustness.

BorgBackup is clearly quite good as an option.



> BorgBackup is clearly quite good as an option.

After one enables rsync with checksums, doesn't Borg have the same issue? I believe Borg needs to do the same rolling checksum over all the data, now, as well?

ZFS sounds like the better option -- just take the last local snapshot transaction, then compare to the transaction of the last sent snapshot, and send everything in between.

And the problem re: Borg and rsync isn't just the cost of reading back and checksumming the data -- for 100,000s of small files (1000s of home directories on spinning rust), it is the speed of those many metadata ops too.


As with rsync borg does not read files if their timestamp/length do not change since the last backup. And for million files on modern SSD it takes just few seconds to read their metadata.


> As with rsync borg does not read files if their timestamp/length do not change since the last backup.

...but isn't that the problem described in the article? If that is the case, Borg would seem to the worst of all possible worlds, because now one can't count on its checksums?


If one worries about bitrot, the backup tools are not good place to detect that. Using a filesystem with native checksums is the way to go.

If one worries about silent file modifications that alters content but keep timestamp and length, then this sounds like malware and, as such, the backup tools are not the right tool to deal with that.


> If one worries about bitrot, the backup tools are not good place to detect that. Using a filesystem with native checksums is the way to go.

Agreed. But I think that elides the point of the article which was "I worry about backing up all my data with my userspace tool."

As noted above, Borg and rsync seem to fail here, because it's wild how much the metadata can screw with you.

> If one worries about silent file modifications that alters content but keep timestamp and length, then this sounds like malware and, as such, the backup tools are not the right tool to deal with that.

Seen this happen all the time in non-malware situations, in what we might call broken software situations, where your packaging software or your update app tinker with mtimes.

I develop an app, httm, which prints the size, date and corresponding locations of available unique versions of files residing on snapshots. And -- this makes it quite effective at proving how often this can happen on Ubuntu/Debian:

    > httm -n --dedup-by=contents /usr/bin/ounce | wc -l
    3
    > httm -n --dedup-by=metadata /usr/bin/ounce | wc -l
    30


The latter type case is what the article is talking about though. At the same time, as the article also discusses, it's unlikely to have actually been caused by malware vs something like a poorly packaged update.

Backup tools should deal with file changes lacking corresponding metadata changes despite it being more convenient to say the system should just always work ideally. At the end of the day the goal of a backup tool is to backup the data, not to skip some of the data because it's faster.


Amen!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: