The main reason why BLAKE3 is much faster than most other hash algorithms is that it can be computed in parallel on many cores.
Inside the kernel and especially in something running continuously like the RNG, you do not want to make all the cores busy without any reason, so the hash computation should be restricted to 1 thread.
In that case, BLAKE3 does not have any important advantages.
When restricted to a single-thread, BLAKE3 is slower than hash algorithms that have hardware support on the CPU, e.g. SHA-256 on most recent CPUs. (But the Linux kernel must support many older CPUs without hardware for hash algorithms, so BLAKE2 is a better choice).
Blake3 is faster because it allows for very efficient SIMD implementations. That alone blows all competition out of the water, with a single thread. The ability to further parallelize it on multiple cores is just a cherry on top.
True, but the use of SIMD is also restricted in the kernel, so that would not matter for the Linux kernel.
In comparison with the hardware instructions for SHA-2 or SHA-3, on those CPUs which have them, BLAKE3 is not faster, because those instructions also use the same SIMD registers, processing the same number of bits per operation.
I use every day BLAKE3, for file checksumming. For this application, on my Zen 3 with hardware SHA-2, BLAKE3 greatly outperforms everything else, but only due to multithreading.
On older CPUs, without SIMD SHA instructions, you are right that BLAKE3 can be faster than other algorithms even in single-thread, by exploiting the parallelism of the BLAKE3 algorithm with a SIMD implementation.
For other hashes, SIMD may not accelerate the computation of a single hash, but when you need to compute multiple hashes you can interleave their computations and obtain similar speedups with SIMD instructions.
Isn't explicit hardware support for SHA-3 rather limited? In particular, there's none on Intel and only A13 and A14 on Apple. It can still be vectorized to a degree on other CPUs, but in that case it'll be slower than Blake3.
For now, only a few extremely recent ARM cores have SHA-3 instructions.
On the other hand, support for SHA-256 and for SHA1 (still useful for non-secure applications) is widespread, in almost all 64-bit ARM CPUs, in all AMD Zen and in some of the Intel CPUs, e.g. Apollo Lake, Gemini Lake, Jasper Lake/Elkhart Lake, Alder Lake, Tiger Lake and Ice Lake.
BLAKE3 was published in 2020. I don't think anyone would seriously use a hash function, cipher, or RNG that has been studied for only two years, let alone in the Linux kernel.
I'm glad we're on Blake2 and seeing the benefits, but why not go straight to Blake3?
What are the differences and why do they matter in this instance?