Fine-tuned MedPalm is worse than GPT-4 on most Medical Challenge Tests. Fine-tun...

		og_kalu on Aug 5, 2023 \| parent \| context \| favorite \| on: Non-determinism in GPT-4 is caused by Sparse MoE Fine-tuned MedPalm is worse than GPT-4 on most Medical Challenge Tests. Fine-tuned Minerva is much worse on arithmetic benchmarks. The LLM space is just different. There's no guarantee a fine-tuned model will beat a bigger generalist one.