1 o1-pro (medium reasoning) 82.3
2 o1 (medium reasoning) 70.8
3 o3-mini-high 61.4
4 Gemini 2.5 Pro Exp 03-25 54.1
5 o3-mini (medium reasoning) 53.6
6 DeepSeek R1 38.6
7 GPT-4.5 Preview 34.2
8 Claude 3.7 Sonnet Thinking 16K 33.6
9 Qwen QwQ-32B 16K 31.4
10 o1-mini 27.0
https://github.com/lechmazur/nyt-connections/
1 o1-pro (medium reasoning) 82.3
2 o1 (medium reasoning) 70.8
3 o3-mini-high 61.4
4 Gemini 2.5 Pro Exp 03-25 54.1
5 o3-mini (medium reasoning) 53.6
6 DeepSeek R1 38.6
7 GPT-4.5 Preview 34.2
8 Claude 3.7 Sonnet Thinking 16K 33.6
9 Qwen QwQ-32B 16K 31.4
10 o1-mini 27.0
https://github.com/lechmazur/nyt-connections/