I can't look at the current state of this and without wondering if it's tokenize...

I can't look at the current state of this and without wondering if it's tokenizer-dyslexia. I wonder if AI performance growth has been borrowed from overfitting and pruning the tokenizer of invalid sequences and leakage the entire corpus, a cardinal sin of making valid predictions.