Given the decrease in the benchmark score from the correction, I don't think you...

		alyxya 9 days ago \| parent \| context \| favorite \| on: IQuest-Coder: A new open-source code model beats C... Given the decrease in the benchmark score from the correction, I don't think you can assume they didn't check a single output. Clearly the model is still very capable and the model cheating its results didn't affect most of the benchmark.