Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Like 4. Definitely single digit. The P40s are slow af


P40 has memory bandwidth of 346GB/s which means it should be able to do around 14+ t/s running a 24 GB model+context.


Not sure why I got downvoted - literally the first result (for me) says the best result[0] is 11t/s at Q3. Everything else is single digits, like 2-8t/s. Also considering that its not supported anymore[1] (It's Compute Capability is 6.1, not supported by cuda anymore) and it's power draw, I'd highly recommend anyone interested in ML stay far away from it - even if its all you can afford.

While the memory bandwidth is decent, you do actually need to do matmuls and other compute operations for LLMs, which again its pretty slow at

[0]: https://old.reddit.com/r/LocalLLaMA/comments/1dcdit2/p40_ben... [1]: https://developer.nvidia.com/cuda-gpus




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: