Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yea they mention a “perplexity drop” relative to naive quantization, but that’s meaningless to me. > We reduce the perplexity drop by 54% (using llama.cpp perplexity evaluation) when quantizing down to Q4_0.

Wish they showed benchmarks / added quantized versions to the arena! :>



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: