I’m not a native English speaker so I checked on the definition of reliability t...

embedding-shape · 2025-11-26T18:00:41 1764180041

> For a tool, I expect “well” to mean that it does what it’s supposed to do

Ah, then LLMs are actually very reliable by your definition. They're supposed to output semi-random text, and whenever I use them, that's exactly what happens. Except for the times I create my own models and software, I basically never see any cases where the LLM did not output semi-random text.

They're not made for producing "correct code" obviously, because that's a judgement only a human can do, what even is "correct" in that context? Not even us humans can agree what "correct code" is in all contexts, so assuming a machine could do so seems foolish.

cpburns2009 · 2025-11-26T14:05:49 1764165949

I'm a native English speaker. Your understanding and usage of the word "reliability" is correct, and that's the exact word I'd use in this conversation. The GP is playing a pointless semantics game.

embedding-shape · 2025-11-26T18:20:05 1764181205

It's not semantics, if the definition is "it does what it’s supposed to do" then probably all of the currently deployed LLMs are reliable according to that definition.

cpburns2009 · 2025-11-26T19:05:12 1764183912

> "it does what it’s supposed to do"

That's the crux of the problem. Many proponents of LLMs over promise the capabilities, and then deny the underperformance through semantics. LLMs are "reliable" only if you're talking about the algorithms behind the scene and you ignore the marketing. Going off the marketing they are unreliable, incorrect, and do not do what they're "supposed to do".

embedding-shape · 2025-11-26T19:14:03 1764184443

But maybe we don't have to stoop down to the lowest level of conversation about LLMs, the "marketing", and instead do what most of us here do best, focus on the technical aspects, how things work, and how we can make them do our bidding in various ways, you know like the OG hacker.

FWIW, I agree LLMs are massively over-sold for the average person, but for someone who can dig into the tech, use it effectively and for what it works for, I feel like there is more interesting stuff we could focus on instead of just a blanket "No and I won't even think about it".