I tried many of the examples in this article in Gemini 2.5 pro and it seems to handle most quite flawlessly. Is it possibly that Google's model is just susceptible to different glitch tokens? I admit most of the technical discussion in the article went a little over my head.
Glitch tokens should be tokenizer-specific. Gemini uses a different tokenizer from the OpenAI models.
The origins of the OpenAI glitch tokens are pretty interesting: the trained an early tokenizer on common strings in their early training data but it turns out popular subreddits caused some weird tokens to be common enough to get assigned an integer, like davidjl - a frequent poster in the https://reddit.com/r/counting subreddit. More on that here: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/#glitch-...