> You could see that by how little RAM ended up being used in llama.cpp when the... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		rattt on April 5, 2023 \| parent \| context \| favorite \| on: TPU v4: An Optically Reconfigurable Supercomputer ... > You could see that by how little RAM ended up being used in llama.cpp when they moved to mmaping the model. From what I've read that was just an error in reading memory consumption after switching to the mmap version and not more memory efficient at all in the end.

VHRanger on April 6, 2023 [–]

Not exactly. It's that the model is loading less stuff out of the mmap'ed weights that you would expect.

The author of the mmap patch chimes in here:

https://news.ycombinator.com/item?id=35393615

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact