Yea they mention a “perplexity drop” relative to naive quantization, but that’s meaningless to me.
> We reduce the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.
Wish they showed benchmarks / added quantized versions to the arena! :>
Wish they showed benchmarks / added quantized versions to the arena! :>