It’s a custom GPT for learning scraping, the newsletter is called The Web Scraping Club and the archive of the article can be found here: https://substack.thewebscraping.club/archive
Nice article, enjoyed reading it.
I’m Pier, co founder of https://Databoutique.com, which is a marketplace for web scraped data. If you’re willing to monetize your data extractions, you can list them on our website. We just started with the grocery industry and it would be great to have you on board.
Well, alternative data in general is anonymized and absolutely does not contain any personal info (even because PII is useless for hedge funds, they need to see trends not sell something to people).
Unless it’s proprietary data (or data acquired from third parties and elaborated), the other source is mainly web scraping and this is regulated. You need to have the rights to scrape this data, which it means that it’s public data
At Databoutique.com we’re trying to solve the web data accessibility problem with a marketplace which connects web data sellers and buyers.
Buyers can get the data with three clicks, on S3 bucket or download from the website. It’s pre scraped, quality checked and legal compliant.
If a website is not listed, you can ask to sellers to provide it.
Sellers deliver data using standard data structures and make their price.
Far from perfect since we launched three months ago, but working on it.
This project is interesting, we at https://www.databoutique.com are building something similar, a curated dataset marketplace for web scraping data.
We believe that using standardization, quality controls and high density verticals, we can cut prices and time to value for web scraping data.
Since I’m quite experienced in the field, I opened a substack called The Web Scraping Club as a side gig. It is mostly free but has some paid articles and have already hundreds of $ in MRR after 2 months.
It is a niche where tutorials and info are pretty sparse around the web and having a centralized blog is useful for operators.