> You could see that by how little RAM ended up being used in llama.cpp when they moved to mmaping the model.
From what I've read that was just an error in reading memory consumption after switching to the mmap version and not more memory efficient at all in the end.
From what I've read that was just an error in reading memory consumption after switching to the mmap version and not more memory efficient at all in the end.