Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.
Theoretically this is an even better size, as it would fit on a 20GB-24GB GPU with more relaxed quantization and much longer context.
Metrics are slightly below 13B, but the theory is that the higher parameter count is more amenable to finetuning. If you search for 22B on huggingface, you can see that frankenllama experiments are ongoing:
Looks like they left out another model though. In the paper they mention a "Unnatural Code Llama" which wipes the floor with every other model/finetune on every benchmark except for slightly losing to Code Llama Python on MBPP pass@100 and slightly losing to GPT-4 on HumanEval pass@1 which is insane.
Meta says later on that they aren't releasing it and give no explanation. I wonder why given how incredible it seems to be.