Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
METR review of OpenAI's GPT-OSS fine-tuning safety methodology (metr.org)
1 point by mustaphah 49 days ago | past
Measuring AI Ability to Complete Long Tasks (metr.org)
2 points by Gedxx 78 days ago | past
Measuring AI Ability to Complete Long Tasks (2x every 7 months) (metr.org)
3 points by tmoertel 82 days ago | past
The Impact of Early-2025 AI on Open-Source Developer Productivity (metr.org)
3 points by jvdvegt 3 months ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks – METR (metr.org)
2 points by diginova 4 months ago | past
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf] (metr.org)
3 points by nreece 5 months ago | past | 1 comment
Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
1 point by sonabinu 5 months ago | past
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf] (metr.org)
2 points by davikr 5 months ago | past
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf] (metr.org)
18 points by ColinEberhardt 5 months ago | past | 2 comments
Measuring the impact of AI on experienced open-source developer productivity (metr.org)
775 points by dheerajvs 5 months ago | past | 485 comments
Recent Frontier Models Are Reward Hacking (metr.org)
2 points by surprisetalk 6 months ago | past
AI's Version of Moore's Law (metr.org)
2 points by aazo11 7 months ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks (metr.org)
2 points by pabo 8 months ago | past
Measuring AI Ability to Complete Long Tasks – METR (metr.org)
7 points by gk1 8 months ago | past | 1 comment
Measuring AI Ability to Complete Long Tasks (metr.org)
4 points by stared 8 months ago | past
Measuring Automated Kernel Engineering (metr.org)
1 point by gsky 9 months ago | past
Evaluating frontier AI R&D capabilities of LLM agents against human experts (metr.org)
1 point by tedsanders on Nov 22, 2024 | past
When LLM agents can do a task, they can often do so at a fraction of human cost (metr.org)
4 points by cpainter on Aug 6, 2024 | past
METR: Model Evaluation and Threat Research (metr.org)
2 points by Olshansky on July 8, 2024 | past
Bounty: Diverse hard tasks for LLM agents (metr.org)
3 points by RoboTeddy on Jan 20, 2024 | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: