I know this is written to be tounge-in-cheek, but its really almost the exact same problem playing out on both sides.
LLMs hallucinate because training on source material is a lossy process and bigger, heavier LLM-integrated systems that can research and cite primary sources are slow and expensive so few people use those techniques by default. Lowest time to a good enough response is the primary metric.
Journalists oversimplify and fail to ask followup questions because while they can research and cite primary sources, its slow and expensive in an infinitesimally short news cycle so nobody does that by default. Whoever publishes something that someone will click on first gets the ad impressions so thats the primary metric.
In either case, we've got pretty decent tools and techniques for better accuracy and education - whether via humans or LLMs and co - but most people, most of the time don't value them.
So if you set temperature=0 and run the LLM serially (making it deterministic) it would stop hallucinating? I don't think so. I would guess that the nondeterminism issues mentioned in the article are not at all a primary cause of hallucinations.
That's an implementation detail I believe. But what I meant was just greedy decoding (picking the token with the highest logit in the LLM output), which can be implemented very easily
"In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism."
Classical LLM hallucination happens because AI doesn’t have a world model. It can’t compare what it’s saying to anything.
You’re right that LLMs favor helpfulness so they may just make things up when they don’t know them, but this alone doesn’t capture the crux of hallucination imo, it’s deeper than just being overconfident.
OTOH, there was an interesting article recently that I’ll try to find saying humans don’t really have a world model either. While I take the point, we can have one when we want to.
LLMs hallucinate because training on source material is a lossy process and bigger, heavier LLM-integrated systems that can research and cite primary sources are slow and expensive so few people use those techniques by default. Lowest time to a good enough response is the primary metric.
Journalists oversimplify and fail to ask followup questions because while they can research and cite primary sources, its slow and expensive in an infinitesimally short news cycle so nobody does that by default. Whoever publishes something that someone will click on first gets the ad impressions so thats the primary metric.
In either case, we've got pretty decent tools and techniques for better accuracy and education - whether via humans or LLMs and co - but most people, most of the time don't value them.