Hacker Newsnew | past | comments | ask | show | jobs | submit | PigiVinci83's commentslogin

I’d like to try it! I’ll write you in the next days


It’s a custom GPT for learning scraping, the newsletter is called The Web Scraping Club and the archive of the article can be found here: https://substack.thewebscraping.club/archive


Nice article, enjoyed reading it. I’m Pier, co founder of https://Databoutique.com, which is a marketplace for web scraped data. If you’re willing to monetize your data extractions, you can list them on our website. We just started with the grocery industry and it would be great to have you on board.


This looks like a really cool website but my only critique is how are you verifying that the data is actually real and not just generated randomly?


Do you have data on which data is in higher demand? Do you keep a list of frequently-requested datasets?


Forza e coraggio, from another italian in tech and fully remote worker


Grazie! Love what you've built, it must've been a lot lit of work!


Just enter in the dataset description and see it, and can even download a sample of the file. https://www.databoutique.com/buy-data-page-detail/balenciaga...

But thanks for the feedback, probably we should make the website clearer


Well, alternative data in general is anonymized and absolutely does not contain any personal info (even because PII is useless for hedge funds, they need to see trends not sell something to people).


I meant data belonging to other companies really, not individuals.


Unless it’s proprietary data (or data acquired from third parties and elaborated), the other source is mainly web scraping and this is regulated. You need to have the rights to scrape this data, which it means that it’s public data


At Databoutique.com we’re trying to solve the web data accessibility problem with a marketplace which connects web data sellers and buyers. Buyers can get the data with three clicks, on S3 bucket or download from the website. It’s pre scraped, quality checked and legal compliant. If a website is not listed, you can ask to sellers to provide it. Sellers deliver data using standard data structures and make their price. Far from perfect since we launched three months ago, but working on it.


  The biggest risk I perceived was something I warned the team about: "The first rule of Scrape Club is: Don't Talk About Scrape Club."
It seems someone broke this rule: https://www.google.com/search?q=the+web+scraping+club


This project is interesting, we at https://www.databoutique.com are building something similar, a curated dataset marketplace for web scraping data. We believe that using standardization, quality controls and high density verticals, we can cut prices and time to value for web scraping data.


Since I’m quite experienced in the field, I opened a substack called The Web Scraping Club as a side gig. It is mostly free but has some paid articles and have already hundreds of $ in MRR after 2 months.

It is a niche where tutorials and info are pretty sparse around the web and having a centralized blog is useful for operators.

Hope this helps find your way


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: