The article mentions that the bit depth can be 16.
You may need more bits for HDR and some additional bits for precision. For example, screen pixels have an exponential intensity curve but image processing is best done in linear.
However, I wonder if floating-point is necessary, or even the best to use compared to using 32-bit fixed-point.
The floating-point format includes subnormal numbers that are very close to zero, and I'd think that could be much more precision than needed.
Processing of subnormal numbers is extra slow on some processors and can't always be turned off.
Did you miss the point of the article? JPEG-XL encoding doesn't rely on quantisation to achieve its performance goals. Its a bit like how GPU shaders use floating point arithmetic internally but output quantised values for the bit depth of the screen.
Which is completely wrong by the way, JPEG-XL quantizes its coefficients after the DCT transform like every other lossy codec. Most codecs have at least some amount of range expansion in their DCT as well, so the values quantized might be greater bit depth than the input data.
The cliff notes version is that JPEG and JPEG XL don't encode pixel values, they encode the discrete cosine transform (like a Fourier transform) of the 2d pixel grid. So what's really stored is more like the frequency and amplitude of change of pixels than individual pixel values, and the compression comes from the insight that some combinations of frequency and amplitude of color change are much more perceptible than others
In addition to the other comments: you can have an internal memory representation of data be Float32, but on disk, this is encoded through some form of entropy encoding. Typically, some of the earlier steps is preparation for the entropy-encoder: you make the data more amenable to entropy-encoding through rearrangement that's either fully reversible (lossless), or near-reversible (lossy).
So 2^32 bit depth? 4 bytes seems an overkill.