Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a natural monopoly. It has already de facto prohibited everyone else's own crawling, as the linked article demonstrates.

When it had a monopoly, AT&T was forbidden from selling software.



I disagree on the "natural" part. Robots.txt that put other search engines at a disadvantage aren't the norm, they're, just like in the early years, some websites supporting only Netscape and MSIE, a direct consequence of Google's current market share and might change once there is a good reason (like DDG growing into a significant player).

If a collection like commoncrawl with bulk downloads was more useful and thus used more often, even Google would have a good reason to use it.


> Robots.txt that put other search engines at a disadvantage aren't the norm, they're, just like in the early years

It's not just robots.txt, it's also cloudflare and IP-based throttling. And it is very, very commonplace: http://gigablast.com/blog.html





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: