As much as I would like to agree to the "AI models do not understand, they just predict the next token", I feel the author of the research does not use valid arguments. Language is more than text? Fine, I could turn on the webcam and integrate video stream into the calculations. Stomping your feet and crying about slurs in the models won't make your argument valid.
I worked on a system that was good enough, then the requirements changed, becoming far more complex; suddenly it was almost impossible to keep improving and updating the codebase without 10 bugs appearing in production (and not in tests).
Good enough may be good enough for the time being, in the long run who knows