Submissions from metr.org

		METR review of OpenAI's GPT-OSS fine-tuning safety methodology (metr.org)
		1 point by mustaphah 49 days ago \| past
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by Gedxx 78 days ago \| past
		Measuring AI Ability to Complete Long Tasks (2x every 7 months) (metr.org)
		3 points by tmoertel 82 days ago \| past
		The Impact of Early-2025 AI on Open-Source Developer Productivity (metr.org)
		3 points by jvdvegt 3 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		2 points by diginova 4 months ago \| past
		Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf] (metr.org)
		3 points by nreece 5 months ago \| past \| 1 comment
		Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		1 point by sonabinu 5 months ago \| past
		Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf] (metr.org)
		2 points by davikr 5 months ago \| past
		Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
		18 points by ColinEberhardt 5 months ago \| past \| 2 comments
		Measuring the impact of AI on experienced open-source developer productivity (metr.org)
		775 points by dheerajvs 5 months ago \| past \| 485 comments
		Recent Frontier Models Are Reward Hacking (metr.org)
		2 points by surprisetalk 6 months ago \| past
		AI's Version of Moore's Law (metr.org)
		2 points by aazo11 7 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		2 points by pabo 8 months ago \| past
		Measuring AI Ability to Complete Long Tasks – METR (metr.org)
		7 points by gk1 8 months ago \| past \| 1 comment
		Measuring AI Ability to Complete Long Tasks (metr.org)
		4 points by stared 8 months ago \| past
		Measuring Automated Kernel Engineering (metr.org)
		1 point by gsky 9 months ago \| past
		Evaluating frontier AI R&D capabilities of LLM agents against human experts (metr.org)
		1 point by tedsanders on Nov 22, 2024 \| past
		When LLM agents can do a task, they can often do so at a fraction of human cost (metr.org)
		4 points by cpainter on Aug 6, 2024 \| past
		METR: Model Evaluation and Threat Research (metr.org)
		2 points by Olshansky on July 8, 2024 \| past
		Bounty: Diverse hard tasks for LLM agents (metr.org)
		3 points by RoboTeddy on Jan 20, 2024 \| past