Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is a fun way to get around robots.txt "Disallow: /"


It also completes CAPTCHAs when I tried it. And clicks the "I am human" buttons.

Sometimes it hesitates on really important button clicks that it determines are not reversible. I was using it to test the UX on an app in beta and it didn't want to click the final step. I had to "trick" it by reminding it I owned the app.

It felt like that scene in Short Circuit 2 where they trick Johnny 5 into plasma cutting his way through a bank vault because it is "their" vault and they are simply testing the security. Wild times.


There is no law that says you have to respect robots.txt. It's just a suggestion.


For the websites that ChatGPT wants to scrape -- Reddit immediately comes to mind -- it's not an issue of law, it's an issue of "the infrastructure now exists to prevent you from doing that."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: