There is an interesting related problem - how do you efficiently test if a buffer contains only zeroes? We use this for automatically sparsifying disk images. There's no standard C function for this. My colleague came up with the following nice trick. It reuses the (presumably already maximally optimized) memcmp function from libc:
Assuming that vector instructions are available, shouldn't it be much faster to actually compare the buffer contents against a vector register initialized to all-zeros rather than comparing against some other memory? Or would memcmp automatically optimize that away because of the precondition that the first 16 bytes are already known to be 0?
It is probably faster, yes (half the number of reads) – but the point of this truck is that you can re-use the (hopefully) vectorized memcmp on every platform with portable code rather than getting on the SIMD ISA treadmill yourself.
Is the choice of 16 as the "limit" value based on benchmarking? As opposed to just doing something like "!buffer[0] && !memcmp(buffer, buffer + 1, size - 1)" which uses the same principle.
I wonder if it would be meaningfully faster if you checked the first 16 bytes as uint64_t or uint128_t instead of byte by byte. It would save you 14 or 15 comparisons per function call.
Yeah, dealing with the smaller buffers is annoying. You could put buffers < 16 bytes in a separate codepath, but that's trading elegance for pretty minor gains.
I have the same issue. I want to use SIMD to do ANDs between large buffers and also be able to detect if a buffer is empty (all zeroes) after an AND. It doesn't seem possible to do this without iterating over the entire buffer again because vpand() doesn't affect the eflags register.
use the popcnt instruction or the popcnt intrinsic function.
It counts how many bits are set to 1, you’re looking for 0.
You can also cast it into a uint64_t integer and do an equality test. There might be a way to use fused multiply add.
Also are you mmaping the file so you can just read it directly as a single buffer? You should be able to madvise to free pages after they’ve been checked.
It would be interesting if there was a way to measure the voltage difference between two memory addresses and if it was equal, the bits would be all one or zero and then you just need to read one byte to see which it is. I don't know how practical that is, but it would be a constant time check.
> There's no standard C function for this. My colleague came up with the following nice trick.
One of the big things about C is that there is no standard library function for anything remotely nontrivial. So successfully coding in C relies on "tricks", and snippets and lore that have been passed on over the years.
Rust, meanwhile, has a check_for_all_zeroes crate or something.
https://gitlab.com/nbdkit/nbdkit/-/blob/b31859402d1404ba0433...
Example usage for sparsifying while copying disk images: https://gitlab.com/nbdkit/libnbd/-/blob/46fa6ecc7422e830f10d...