Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

chat log please?



I've never found the Socratic method to work well on any model I've tried it with. They always seem to get stuck justifying their previous answers.

We expect them to answer the question and re-reason the original question with the new information, because that's what a human would do. Maybe next time I'll try to be explicit about that expectation when I try the Socratic method.


Is the knowledge cutoff for this thing so stale or is this just bad performance on recent data ?


The knowledge cutoff is before the 2024 election (which was, after all, just 9 months ago), June 2024 (I believe this is consistent with the current versions of GPT-4o and -4.1), after Biden had secured the nomination.

It is very clear in that chat logs (which include reasoning traces) that the model knew that, knew what the last election it knew about was, and answered correctly based on its cut off initially. Under pressure to answer about an election that was not within its knowledge window it then confabulated a Biden 2024 victory, which it dug in on after being contradicted with a claim that, based on the truth at the time of its knowledge cutoff, was unambiguously false ("Joe Biden did not run") He, in fact, did run for reelection, but withdrew after having secured enough delegates to win the nomination by a wide margin on July 21. Confabulation (called "hallucination" in AI circles, but it is more like human confabulation than hallucination) when pressed for answers on questions for which it lacks grounding remains an unsolved AI problem.

Unsolved, but mitigated by providing it grounding independent of its knowledge cutoff, e.g., by tools like web browsing (which GPT-OSS is specifically trained for, but that training does no good if its not hooked into a framework which provides it the tools.)


I like that term much better, confabulation. I’ve come to think of it as it relies on an inherent trust in the fact that whatever process it uses to produce a coherent response (which I don’t think the LLM can really analyze after the fact) is inherently a truth-making process, since it trusts inherently its training data and considers that the basis of all its responses. Something along those lines. We might do something similar at times as humans, it feels similar to how some people get trapped in lies and almost equate what they have said as true with having the quality of truth as a result of them having claimed it as true (pathological liars can demonstrate this kind of thinking).


> since it trusts inherently its training data and considers that the basis of all its responses.

Doesn't that make "hallucination" the better term? The LLM is "seeing" something in the data that isn't actually reflected in reality. Whereas "confabulation" would imply that LLMs are creating data out of "thin air", which leaves the training data to be immaterial.

Both words, as they have been historically used, need to be stretched really far to fit an artificial creation that bears no resemblance to what those words were used to describe, so, I mean, any word is as good as any other at that point, but "hallucination" requires less stretching. So I am curious about why you like "confabulation" much better. Perhaps it simply has a better ring to your ear?

But, either way, these pained human analogies have grown tired. It is time to call it what it really is: Snorfleblat.


It is painful to read, I know, but if you make it towards the end it admits that its knowledge cutoff was prior to the election and that it doesn't know who won. Yet, even then, it still remains adamant that Biden won.


incredible




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: