Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
METR review of OpenAI's GPT-OSS fine-tuning safety methodology
(
metr.org
)
1 point
by
mustaphah
49 days ago
|
past
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
Gedxx
78 days ago
|
past
Measuring AI Ability to Complete Long Tasks (2x every 7 months)
(
metr.org
)
3 points
by
tmoertel
82 days ago
|
past
The Impact of Early-2025 AI on Open-Source Developer Productivity
(
metr.org
)
3 points
by
jvdvegt
3 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
2 points
by
diginova
4 months ago
|
past
Measuring the Impact of AI on Experienced OSS Developer Productivity [pdf]
(
metr.org
)
3 points
by
nreece
5 months ago
|
past
|
1 comment
Measuring Impact of 2025 AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
1 point
by
sonabinu
5 months ago
|
past
Measuring the Impact of Early-2025 AI on Experienced OpenSource Dev Productivity [pdf]
(
metr.org
)
2 points
by
davikr
5 months ago
|
past
Measuring the Impact of AI on Experienced Open-Source Developer Productivity [pdf]
(
metr.org
)
18 points
by
ColinEberhardt
5 months ago
|
past
|
2 comments
Measuring the impact of AI on experienced open-source developer productivity
(
metr.org
)
775 points
by
dheerajvs
5 months ago
|
past
|
485 comments
Recent Frontier Models Are Reward Hacking
(
metr.org
)
2 points
by
surprisetalk
6 months ago
|
past
AI's Version of Moore's Law
(
metr.org
)
2 points
by
aazo11
7 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
2 points
by
pabo
8 months ago
|
past
Measuring AI Ability to Complete Long Tasks – METR
(
metr.org
)
7 points
by
gk1
8 months ago
|
past
|
1 comment
Measuring AI Ability to Complete Long Tasks
(
metr.org
)
4 points
by
stared
8 months ago
|
past
Measuring Automated Kernel Engineering
(
metr.org
)
1 point
by
gsky
9 months ago
|
past
Evaluating frontier AI R&D capabilities of LLM agents against human experts
(
metr.org
)
1 point
by
tedsanders
on Nov 22, 2024
|
past
When LLM agents can do a task, they can often do so at a fraction of human cost
(
metr.org
)
4 points
by
cpainter
on Aug 6, 2024
|
past
METR: Model Evaluation and Threat Research
(
metr.org
)
2 points
by
Olshansky
on July 8, 2024
|
past
Bounty: Diverse hard tasks for LLM agents
(
metr.org
)
3 points
by
RoboTeddy
on Jan 20, 2024
|
past
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: