The resolution is actually fairly simple. It's an incredibly brilliant stochastic parrot with some limited reasoning capabilities.
Some folks will try to say it cannot reason, but they are wrong, there is extensive proof of that.
The only question is how limited are its reasoning capabilities. After spending extensive time on openai/evals, having submitted 3 of my own, and doing a lot of tests, I would argue that an average person of average IQ could out think GPT4 - as long as the stochastic parrot aspect wasn't a factor.
That's probably an accurate assessment, the question is mainly if the reasoning can be improved to a notable extent and how much on the current architecture.
I myself assumed that we're pretty close to the end of the S curve when first using 3.5-turbo and figured that hallucinations will be pretty hard to overcome, but with GPT 4 being such a massive improvement on all metrics I'm no longer as sure. GPT 5 will probably be more definitive on what's possible, based on where it starts having diminishing returns.
It's very hard to know because we don't know and can't experiment with its training data. So - it may be doing first principle reasoning, or it may be doing token substitution vs. some known example that it's seen before and is matching to.
The dictionary definition of reasoning says nothing about how the thinking is done, only that it's sensible and logical, which is exactly what GPT4 is. Limited, yes, but it reasons.
Some folks will try to say it cannot reason, but they are wrong, there is extensive proof of that.
The only question is how limited are its reasoning capabilities. After spending extensive time on openai/evals, having submitted 3 of my own, and doing a lot of tests, I would argue that an average person of average IQ could out think GPT4 - as long as the stochastic parrot aspect wasn't a factor.