Thats... Not very promising. The thread suggests it doesn't even quantize the mo...

behnamoh · on Dec 7, 2023

I think llama.cpp is the sweet spot right now, due to its grammar capability and many other features (e.g., multimodal). MLC-LLM is nice but they don't offer uncensored models.

brucethemoose2 · on Dec 7, 2023

- A: You can convert models to MLC yourself, just like GGUF models, with relative ease.

- B: Yeah, llama.cpp has a killer feature set. And killer integration with other frameworks. MLC is way behind, but is getting more fleshed out every time I take a peek at it.

- C: This is a pet peeve of mine, but I've never run into a local model that was really uncensored. For some, if you give them a GPT4 prompt... Of course you get a GPT4 response. But you can just give them a unspeakable system prompt or completion, and they will go right ahead and complete it. I don't really get why people fixate on the "default personality" of models trained on GPT4 data.

Tiberium · on Dec 7, 2023

C. Have you tried OpenHermes 2.5? It's a Mistral chat finetune, but a very good one at that.

brucethemoose2 · on Dec 7, 2023

Yeah. If you are looking for new models, I would recommend Xaberius and Cybertron, as well as the various OpenHermes DPO finetunes.

Personally I run my own Yi 200K DARE merge because I love the long context.

mark_l_watson · on Dec 7, 2023

Llama.cpp is great but I have moved to mostly using Ollama because it is both good on the command line and ‘ollama server’ runs a very convenient to use REST server.

In any case, I had fun with MLX today, and I hope it implements 4 bit quantization soon.

behnamoh · on Dec 7, 2023

Does Ollama let you set server parameters? (e.g., temperature, max_tokens)

lagniappe · on Dec 7, 2023

yes, you put them in a Modelfile along with whatever system prompt and model you choose. The grammar is similar to a Dockerfile.