Llama.cpp is great but I have moved to mostly using Ollama because it is both good on the command line and ‘ollama server’ runs a very convenient to use REST server.
In any case, I had fun with MLX today, and I hope it implements 4 bit quantization soon.
In any case, I had fun with MLX today, and I hope it implements 4 bit quantization soon.