Yeah, their list of recommendations could use another point: expose the public data in a simple, structured way.
I'm working right now on an inventory management system for a clinic which would really benefit from pulling the prices and availability from a very specialised online shop. I wish I could just get a large, fully cached status of all items in a json/CSV/whatever format. But they're extremely not interested, so I'm scraping the html from 50 separate categories instead. They'll get a few daily bot hits and neither of us will be happy about it.
If people are scraping data that you're not selling, they're not going to stop - just make it trivially accessible instead in a way that doesn't waste resources and destroy metrics.
The counterpoint is 'Why hand your competitors data on a silver plate'?
Sure you might be willing to build the bot to scrape it... but some other competitors won't go to this effort so it still means a bit of information asymmetry and stops some of your competitors poaching customers / employing various marketing tactics to exploit short term shortages or pricing charges etc.
I really don't believe we're in a situation where a company can exploit product availability and pricing data, is pushing enough volume to make it worth it, can process that information effectively, yet cannot hire someone on Fiverr to write a scraper in a few hours.
> 'Why hand your competitors data on a silver plate'?
To lessen the issue from the article and free up server resources for actual customers.
That depends - scrapers are currently annoying and temperamental and you have to maintain them. Also the idea of allowing some random person from Fiverr to write some code you are going to have running in your infra that has access to your webshop, ERP and the open internet isn't usually that palatable to most IT teams.
I wonder if LLM agents will know to go for apis and data or if they'll keep naively scraping in the future. A lot of traffic could come down to "find me x product online" chats eventually
I'm working right now on an inventory management system for a clinic which would really benefit from pulling the prices and availability from a very specialised online shop. I wish I could just get a large, fully cached status of all items in a json/CSV/whatever format. But they're extremely not interested, so I'm scraping the html from 50 separate categories instead. They'll get a few daily bot hits and neither of us will be happy about it.
If people are scraping data that you're not selling, they're not going to stop - just make it trivially accessible instead in a way that doesn't waste resources and destroy metrics.