Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You could see that by how little RAM ended up being used in llama.cpp when they moved to mmaping the model.

From what I've read that was just an error in reading memory consumption after switching to the mmap version and not more memory efficient at all in the end.



Not exactly. It's that the model is loading less stuff out of the mmap'ed weights that you would expect.

The author of the mmap patch chimes in here:

https://news.ycombinator.com/item?id=35393615




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: