Yes, CPU inference. For llama.cpp with Apple M1/M2 the GPU inference (via metal) is about 5x faster than CPU for text generation and about the same speed for prompt processing. Not insignificant but not giant either.
You generally can't hook up large storage drives to nvme. Those are all tiny flash storage. I'm not sure why you brought it up.
You generally can't hook up large storage drives to nvme. Those are all tiny flash storage. I'm not sure why you brought it up.