Surely lisps don't have drastically more special characters as other languages? A few more parens, sure, but less curly braces, commas, semicolons, etc
Also feels like making sure the tokeniser has distinct tokens for left/right parens would be all that is required to make LLMs work with them
Don't get me wrong, they do work with lisps already, had plenty of success having various LLMs creating and managing Clojure code, so we aren't that far off.
But I'm having way more "unbalanced parenthesis" errors than with Algol-like languages. Not sure if it's because of lack of training data, post-training or just needing special tokens in the tokenizer, but there is a notable difference today.
Also feels like making sure the tokeniser has distinct tokens for left/right parens would be all that is required to make LLMs work with them