> the model has no inherent knowledge about its confidence levels
Kind of. See e.g. https://openreview.net/forum?id=mbu8EEnp3a, but I think it was established already a year ago that LLMs tend to have identifiable internal confidence signal; the challenge around the time of DeepSeek-R1 release was to, through training, connect that signal to tool use activation, so it does a search if it "feels unsure".
Wow, that's a really interesting paper. That's the kind of thing that makes me feel there's a lot more research to be done "around" LLMs and how they work, and that there's still a fair bit of improvement to be found.
For these types of problems (i.e. most problems in the real world), the "definitive or deterministic" isn't really possible. An unreliable party you can throw at the problem from a hundred thousand directions simultaneously and for cheap, is still useful.
> Western shows are all about the "you don't have to sacrifice anything to win" and Eastern shows are all about the "you're the chosen one" but this one was "the establishment is the establishment and most of the time it wins".
What's sorely missing is the very rare theme of "the establishment wins, and for a good reason, and it's actually a good thing".
Isn't that basically every cop show for instance? Like an episode of Law and Order is this person does something bad, the establishment finds and punishes them hurray.
A favorite tidbit I learned years ago was that the Chinese invented Law and Order genre pretty much before anyone else. Very much an establishment wins genre.
Here’s the Google summary:
> Early Chinese detective stories, known as gong'an ("court case") fiction, emerged from oral tales and plays during the Song Dynasty (960-1127), featuring incorruptible magistrate-detectives like Bao Zheng (Judge Bao) and Di Renjie (Judge Dee) who used clever deduction, forensic logic, and sometimes supernatural elements to solve crimes.
Didn't watch Law and Order much (my wife is a fan though, so I'll ask).
Most of the cop shows/procedurals I saw have some kind of "corrupt mayor" arc as a substantial part of their plot, but I guess if you go one level up, it's still "the establishment wins". But then anything where civilization doesn't collapse would be that.
LaO doesn't always follow that forumula. In some LaO the trial is botched or the law doesn't protect the victims or the perps escape justice due to political influence, et al.
Still, cop shows generally are about the "the establishment wins, and for a good reason, and it's actually a good thing" which the other commentator said is a theme that is sorely missing.
There is actually a little bit of that in this. While the charismatic leader has some points about how the establishment has gotten weak and corrupt, overall it seems pretty par for the course. To be honest, it's better he didn't win. He was a bit demagoguey.
It didn't have to, not explicitly. The tone and the context already hint at that - if you saw someone creating a fake cover of an existing periodical but 10 years into the future, you'd likely assume it's part of some joke or a commentary related to said periodical, and not a serious attempt at predicting the future. And so would an LLM.
People keep forgetting (or worse, still disbelieving) that LLMs can "read between the lines" and infer intent with good accuracy - because that's exactly what they're trained to do[0].
Also there's prior art for time-displaced HN, and it's universally been satire.
--
[0] - The goal function for LLM output is basically "feels right, makes sense in context to humans" - in fully general meaning of that statement.
AI works. It's actually useful. Since GPT-4, tool calling capability is good enough. It's trivial to do a better job than Copilot on any task using any current model of the major LLM providers. I'm not talking API, even with basic chat frontend, regular users easily beat Copilot by simply copy-pasting between Word/Excel and the chat frontend.
If a twelve year old can one-shot a better product for any given use case than Microsoft Copilot, then it's not just "merchants running to line up in front of you", something more basic must be broken.
Kind of. See e.g. https://openreview.net/forum?id=mbu8EEnp3a, but I think it was established already a year ago that LLMs tend to have identifiable internal confidence signal; the challenge around the time of DeepSeek-R1 release was to, through training, connect that signal to tool use activation, so it does a search if it "feels unsure".
reply