Interesting that there's a 34B model. That was missing from the original Llama 2...

brucethemoose2 · on Aug 24, 2023

Someone "grafted" llama 33B onto llama v2 13B to make "llama 22B"

https://huggingface.co/chargoddard/llama2-22b

Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.

Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:

https://huggingface.co/models?sort=modified&search=22b

nabakin · on Aug 24, 2023

Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.

Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.

ImprobableTruth · on Aug 24, 2023

It's "unnatural" because it was finetuned on generated data using another model, almost certainly gpt-4 (whose TOS forbid this).

redox99 · on Aug 24, 2023

I can't imagine it being better than Llama1 33B, after all this code finetuning.

modeless · on Aug 24, 2023

But the license for llama 2 is a whole lot better.

redox99 · on Aug 24, 2023

Meh.

If you're using it commercially you're probably deploying it on a server where you're not limited by the 24GB and you can just run llama 2 70b.

The majority of people who want to run it locally on 24GB either want roleplay (so non commercial) or code (you have codellama)