Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Apple Silicon Macs have special matrix multiplication units (AMX) that can do matrix multiplication fast and with low energy requirements [1]. These AMX units can often beat matrix multiplication on AMD/Intel CPUs (especially those without a very large number of cores). Since a lot of linear algebra code uses matrix multiplication and using the AMX units is only a matter of linking against Accelerate (for its BLAS interface), a lot of software that uses BLAS is faster o Apple Silicon Macs.

That said, the GPUs in your M1 Mac are faster than the AMX units and any reasonably modern NVIDIA GPU will wipe the floor with the AMX units or Apple Silicon GPUs in raw compute. However, a lot of software does not use CUDA by default and for small problem sets AMX units or CPUs with just AVX can be faster because they don't incur the cost of data transfers from main memory to GPU memory and vice versa.

[1] Benchmarks:

https://github.com/danieldk/gemm-benchmark#example-results

https://explosion.ai/blog/metal-performance-shaders (scroll down a bit for AMX and MPS numbers)



> That said, the GPUs in your M1 Mac are faster than the AMX units

Not for double, which is what R mostly uses IIRC.


Ah, thanks for the correction! I never use R, so I assumed that it uses/supports single-precision floating point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: