Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m not a native English speaker so I checked on the definition of reliability

  the quality of being able to be trusted or believed because of working or behaving well
For a tool, I expect “well” to mean that it does what it’s supposed to do. My linter are reliable when it catches bad patterns I wanted it to catch. My editor is reliable when I can edit code with it and the commands do what they’re supposed to do.

So for generating text, LLMs are very reliable. And they do a decent job at categorizing too. But code is formal language, which means correctness is the end result. A program may be valid and incorrect at the same time.

It’s very easy to write valid code. You only need the grammar of the language. Writing correct code is another matter and the only one that is relevant. No one hire people for knowing a language grammar and verifying syntax. They hire people to produce correct code (and because few businesses actually want to formally verify it, they hire people that can write code with a minimal amount of bugs and able to eliminate those bugs when they surface).





> For a tool, I expect “well” to mean that it does what it’s supposed to do

Ah, then LLMs are actually very reliable by your definition. They're supposed to output semi-random text, and whenever I use them, that's exactly what happens. Except for the times I create my own models and software, I basically never see any cases where the LLM did not output semi-random text.

They're not made for producing "correct code" obviously, because that's a judgement only a human can do, what even is "correct" in that context? Not even us humans can agree what "correct code" is in all contexts, so assuming a machine could do so seems foolish.


I'm a native English speaker. Your understanding and usage of the word "reliability" is correct, and that's the exact word I'd use in this conversation. The GP is playing a pointless semantics game.

It's not semantics, if the definition is "it does what it’s supposed to do" then probably all of the currently deployed LLMs are reliable according to that definition.

> "it does what it’s supposed to do"

That's the crux of the problem. Many proponents of LLMs over promise the capabilities, and then deny the underperformance through semantics. LLMs are "reliable" only if you're talking about the algorithms behind the scene and you ignore the marketing. Going off the marketing they are unreliable, incorrect, and do not do what they're "supposed to do".


But maybe we don't have to stoop down to the lowest level of conversation about LLMs, the "marketing", and instead do what most of us here do best, focus on the technical aspects, how things work, and how we can make them do our bidding in various ways, you know like the OG hacker.

FWIW, I agree LLMs are massively over-sold for the average person, but for someone who can dig into the tech, use it effectively and for what it works for, I feel like there is more interesting stuff we could focus on instead of just a blanket "No and I won't even think about it".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: