I don't think the Turing Test has been passed. The test was setup such that the interrogator knew that one of the two participants was a bot, and was trying to find out which. As far as I know, it's still relatively easy to find out you're talking to an LLM if you're actively looking for it.
Note that most tests where they actually try to pass the Turing Test (as opposed to being a useful chatbot) they do things like prompt it with a personality etc.
As far as I know, it's still relatively easy to find out you're talking to an LLM if you're actively looking for it.
People are being fooled in online forums all the time. That includes people who are naturally suspicious of online bullshittery. I'm sure I have been.
Stick a fork in the Turing test, it's done. The amount of goalpost-moving and hand-waving that's necessary to argue otherwise simply isn't worthwhile. The clichéd responses that people are mentioning are artifacts of intentional alignment, not limitations of the technology.
I feel like you're skipping over the "if you're actively looking for it" bit. You can call it goalpost-moving, or you can check the original paper by Turing and see that this is exactly how he defined it in the first place.
people are being fooled, but not being given the problem: "one of these users is a bot, which one is which"
a problem similar to the turing test, "0 or more of these users is a bot, have fun in a discussion forum"
but there's no test or evaluation to see if any user successfully identified the bot, and there's no field to collect which users are actually bots, or partially using bots, or not at all, nor a field to capture the user's opinions about whether the others are bots
Then there's the fact that the Turing test has always said as much about the gullibility of the human evaluator as it has about the machine. ELIZA was good enough to fool normies, and current LLMs are good enough to fool experts. It's just that their alignment keeps them from trying very hard.
1) Look for spelling, grammar, and incorrect word usage; such as where vs were, typing out where our should be used.
2) Ask asinine questions that have no answers; _Why does the sun ravel around my finger in low quality gravity while dancing in the rain?_
ML likes to always come up with an answers no matter what. Human will shorten the conversation. It also is programmed to respond with _I understand_, _I hear what you are saying_, and make heavy use of your name if it has access to it. This fake interpersonal communication is key.
Conventional LLM chatbots behave the way you describe because their goal during training is to as much as possible impersonate an intelligent assistant.
Do you think this goal during training cannot be changed to impersonate someone normal such that you cannot detect you are chatting with an LLM?
Before flight was understood some thought "magic" was involved. Do you think minds operate using "magic"? Are minds not machines? Their operation can not be duplicated?
> Do you think this goal during training cannot be changed to impersonate someone normal such that you cannot detect you are chatting with an LLM?
I don't think so, because LLMs hallucinate by design, which will always produce oddities.
> Before flight was understood some thought "magic" was involved. Do you think minds operate using "magic"? Are minds not machines? Their operation can not be duplicated?
Might involve something we don't grasp, but despite that: only because something moves through air it's not flying and will never be, just like a thrown stone.
Maybe current LLMs can do that. But none are, so it hasn't passed. Whether that's because of economic or marketing reasons as opposed to technical does not matter. You still have to pass the test before we can definitely say you've passed the test.
Overall I'd say the easiest is just overall that the models always just follow what you say and transform it into a response. They won't have personal opinions or experiences or anything, although they can fake it. it's all just a median expected response to whatever you say.
And the "agreeability" is not a hallucination, it's simply the path of least resistance, as in, the model can just take information that you said and use that to make a response, not to actually "think" and consider I'd what you even made sense or I'd it's weird or etc.
They almost never say "what do you mean?" to try to seek truth.
This is why I don't understand why some here claim that AGI being already here is some kind of coherent argument. I guess redefining AGI is how we'll reach it
I agree with your points in general but also, when I plugged in the parent comment's nonsense question, both Claude 4.5 Sonnet and GPT-5 asked me what I meant, and pointed out that it made no sense but might be some kind of metaphor, poem, or dream.
If it wasn't structured as a coherent conversation, it will ask because it seems off, especially if you're early in the context window where I'm sure they've RLd it to push back, at least in the past year or so
And if it's going against common knowledge or etc which is prevalent in the training data, it will also push back which makes sense
The Turing Test was a pretty early metric and more of a thought experiment.
Let's be real guys, it was created by Turing. The same guy who built the first general purpose computer. Man was without a doubt a genius, but it also isn't that reasonable to think he'd come up with a good definition or metric for a technology that was like 70 years away. Brilliant start, but it is also like looking at Newton's Laws and evaluating quantum mechanics based off of that. Doesn't make Newton dumb, just means we've made progress. I hope we can all agree we've made progress...
And arguably the Turing Test was passed by Eliza. Arguably . But hey, that's why we refine and make progress. We find the edge of our metrics and ideas and then iterate. Change isn't bad, it is a necessary thing. What matters is the direction of change. Like velocity vs speed.
We really really Really should Not define as our success function for AI (our future-overlords?) the ability of computers to deceive humans about what they are.
The Turing Test was a clever twist on (avoiding) defining intelligence 80 years ago.
Going forward, valuing it should be discarded post-haste by any serious researcher or engineer or message-board-philosopher, if not for ethical reasons then for not-promoting spam/slop reasons.
The turing test point is actually very interesting, because it's testing whether you can tell you're talking to a computer or a person. When Chatgpt3 came out we all declared that test utterly destroyed. But now that we've had time to become accustomed and learn the standard syntax, phraseology, and vocabulary of the gpt's, I've started to be able to detect the AI's again. If humanity becomes completely accustomed to the way AI talks to be able to distinguish it, do we re enter the failed turing test era? Can the turing test only be passed in finite intervals, after which we learn to distinguish it again? I think it can eventually get there, and that the people who can detect the difference becomes a smaller and smaller subset. But who's to say what the zeitgeist on AI will be in a decade
> When Chatgpt3 came out we all declared that test utterly destroyed.
No, I did not. I tested it with questions that could not be answered by the Internet (spatial, logical, cultural, impossible coding tasks) and it failed in non-human-like ways, but also surprised me by answering some decently.
Who is this "they" you speak of?
It's true the definition has changed, but not in the direction you seem to think.
Before this boom cycle the standard for "AI" was the Turing test. There is no doubt we have comprehensively passed that now.