Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If you are writing a scraper it behooves you to understand the website that you are scraping.

That’s what semantic markup is for? No? H1…n:s, article:s, nav:s, footer:s (and microdata even) and all that helps both machines and humans to understand what parts of the content to care about in certain contexts.

Why treat certain CMS:s different when we have the common standard format HTML?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: