Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The biggest problem with the new XPath versions is that the W3C made the standards, but almost no one implemented them, so you cannot actually use them

I was doing web scraping, and needed regular expressions to get the text, so I have implemented XPath 2. And currently I am updating it to XPath 3.1: http://www.videlibri.de/xidel.html



Yooo, thanks for Xidel! I use it dozens of times per week. It's amazing. Next to the actual shell probably the single most useful ETL and scraping tool I've ever encountered. Keep at it!


> I was doing web scraping, and needed regular expressions to get the text, so I have implemented XPath 2.

Most XPath implementations have no issue with adding extension functions (in fact many support exslt[0] out of the box), you really do not need to use (let alone implement) XPath 2.0 to use regex functions.

[0] http://exslt.org/regexp/index.html


I don't think this especially changes the underlying point: anyone using tools which were based on libxml2 or xerces is basically stuck in 1999. Having to find and install custom extensions adds a regular frictional cost which encourages you to just do more work in a full programming language since you know you'll be able to satisfy any requirement that way.

I saw so many developers sour on XML after hitting the “This would be easy if we used XPath 2 but instead it's hard” wall that I wonder if anyone on the relevant standards committees ever thought about how much libxml2 would make their work relevant.


I did not plan to implement it all, only the parts I needed for the webpages in my city. At first I did not even have backward axes. But people care much more about XPath than they care about my city

I also was doing too much competitive programming back then, where you have to discover and implement a highly complex algorithm in a few hours

If such a complex implementation takes a few hours, I could not imagine implementing anything else taking much longer (especially when the spec already says what needs to be implemented and it does not need to be discovered). A few days at most...

But now I am still working on it 14 years later


Though I've never used Xidel, I came across it when researching XPath 2/3, and was very impressed that anyone managed to implement these massive, complicated specs all by themselves.

The major OSS XML libs, including LibXML2 and Xerces, do not implement what Xidel does, and neither to some proprietary libs like MSXML.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: