It just means that once you send your test questions to a model API, that compan...

dmos62 · 2025-08-22T12:30:00 1755865800

Sounds a bit presumptious to me. Sure, they have your needle, but they also need a cost-efficient way to find it in their hay stack.

lucianbr · 2025-08-22T16:59:34 1755881974

They have quite large amounts of money. I don't think they need to be very cost-efficient. And they also have very smart people, so likely they can figure out a somewhat cost-efficient way. The stakes are high, for them.

noodletheworld · 2025-08-22T13:53:01 1755870781

Security through obscurity is not security.

Your api key is linked to your credit card, which is linked to your identity.

…but hey, youre right.

Lets just trust them not to be cheating. Cool.

merelysounds · 2025-08-22T14:24:38 1755872678

Would the model owners be able to identify the benchmarking session among many other similar requests?

irthomasthomas · 2025-08-22T14:47:43 1755874063

Depends. Something like arc-agi might be easy as it follows a defined format. I would also guess that the usage pattern for someone running a benchmark will be quite distinct from that of a normal user, unless they take specific measures to try to blend in.