Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.


Someone "grafted" llama 33B onto llama v2 13B to make "llama 22B"

https://huggingface.co/chargoddard/llama2-22b

Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.

Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:

https://huggingface.co/models?sort=modified&search=22b


Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.


It's "unnatural" because it was finetuned on generated data using another model, almost certainly gpt-4 (whose TOS forbid this).


I can't imagine it being better than Llama1 33B, after all this code finetuning.


But the license for llama 2 is a whole lot better.


Meh.

If you're using it commercially you're probably deploying it on a server where you're not limited by the 24GB and you can just run llama 2 70b.

The majority of people who want to run it locally on 24GB either want roleplay (so non commercial) or code (you have codellama)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: