It's not clear what LLMs are good at, and there's great interest in finding out....

It's not clear what LLMs are good at, and there's great interest in finding out. This is made harder by the frenetic pace of development (GPT 2 came out in 2019). Not surprising at all that there's research into how LLMs fail and why.

Even for someone who kinda understands how the models are trained, it's surprising to me that they struggle when the symbols change. One thing computers are traditionally very good at is symbolic logic. Graph bijection. Stuff like that. So it's worrisome when they fail at it. Even in this research model which is much, much smaller than current or even older models.