Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tool could be extended to support Unicode, whereas AFAIK it would not be possible to extend it to support backreferences. Are there any other “regex” features that would be impossible to support?


>> Too many to enumerate

I take back my previous claim, this is a wrong exaggeration.

> The tool could be extended to support Unicode

Not an easy task. There are some things in the standard that do not map neatly to states, notably foldcasing of characters that change the count of characters and the treatment of the generic line boundary. Edit: after browsing UTS#18, I am almost certain that a conforming implementation cannot be mapped as exemplified in the tool. Maybe there's a neat work-around possible.

> features that would be impossible to support?

(?=, (?!, (?<=, (?<!, (?{, (??{, (?&, (?(…), (?>, (*asr:, (*SKIP)


Some of these are in fact possible, though with some restrictions. I wrote a blog post on how to support some negative lookbehinds: http://allanrbo.blogspot.com/2020/01/alternative-to-negative...


Intersection and complement are both expressible in regular expressions (in the CS sense). Is that not what you mean by impossible to support?

(I'm too lazy to look up what the other listed notations mean.)


> Is that not what you mean

no, I mean generic line boundary




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: