David Gregorian

Web Scraping 🔍🔥

by
Scraping public data from the web, transforming it, and using it for a new product can become a very successful business. What kind of web scraping projects have you worked on and which tools did you use?
143 views

Add a comment

Replies

Best
Tony paul
I've been working in web scraping for almost 10 years. The most demand we've seen is from the e-commerce industry in terms of the volume of the data scraped. The common use cases are price monitoring, competitive intelligence, reputation monitoring, etc. Another hot use case is extracting data from Linkedin. If I have to list the number of use-cases our data scraping supported - it will be more than 100 very different use cases across 20+ industries. Initially, we started with Python frameworks like scrapy and then built our own tools internally. I'm the founder of Datahut(https://datahut.co/), a data ( web scraped ) as a service provider.
David Gregorian
@tonypaul_hb Sounds interesting. Are you still using Python or did you switch to another tech tack meanwhile?
Bertha Kgokong
(1) Scrapping job listing websites and creating your own product, mailing list etc for job hunters tools - python, selenium, Beautiful Soup
David Gregorian
@berthakgokong Sounds interesting!
Nik Hazell
I never finished it - but I started a Strava scraping project. I think there's a ton of suuuuper interesting data in there, although I did it for interests sake, rather than to monetise it. And yep, like @berthakgokong says - Python, Beautiful Soup, etc.
David Gregorian
@berthakgokong @nik_hazell Also pretty cool. I think collecting data for a while and then figuring out what do to with it later is also not a bad idea. The value of data in general will be rising in the future. Have you tried puppeteer?
Nik Hazell
@berthakgokong @david_gregorian I haven't - would you recommend?
David Gregorian
@berthakgokong @nik_hazell You should check it out. The usability is pretty good, especially if you use it with Typescript. It is based on Chromium. All in all it has some quirks when controlling a headless browser engine, but I think that's not the fault of Puppeteer itself.
Renat Gabitov
Funny thing, I scraped the "Top Most Upvoted Products" using Bardeen.ai (our tool). It worked really nicely. BUT I wanted to figure out which month is the best to launch, and turns out they haven't updated that page, so now I gotta scrape the all products. https://www.producthunt.com/e/50... Let's see where this takes me.
David Gregorian
@renat_gabitov Haha I also thought about it once. Can't you use the graphql api of producthunt? I think it is not public...
Michael Silber
@renat_gabitov @david_gregorian You can for sure use our public API for projects https://api.producthunt.com/v2/docs
David Gregorian
@renat_gabitov @product_at_producthunt ah nice, thanks for the hint Michael :)
Amirali Nurmagomedov
I remember my rookie days at coding. I was usually doing a lot of parsing, mostly bots fetching videos from various web sources. Everything done with preg_match function in PHP 🥲
David Gregorian
@amirali_nurmagomedov Damn that's old school :P How long ago was that?
Amirali Nurmagomedov
@david_gregorian it was 2006-2007, damn 16 years ago :(
Victor G. Björklund
Job websites, company databases, google serp, booking sites, etc. Mostly using google scrapy.
David Gregorian
@victorbjorklund What do you mean by google scrapy?
Olivia
@victorbjorklund sounds awesome! Just wondering if any legal risks
Balázsi Róbert
I'm building a no-code web scraping tool called https://datagrab.io.
David Gregorian
@balazsi_robert Looks pretty dope! Did you create a chrome add-on?
Balázsi Róbert
@david_gregorian Thanks, David! Yes, I did! :)
Jared Wright
https://Metaheads.xyz - search engine for fb comments. nodejs + selenium :)
David Gregorian
@jawerty Looks awesome! Does it store all the scraped data on a custom db? Or is there something happening on the fly, when doing a search?
Naimur Rahman
I worked with Nodejs and puppeteer to scrape many complex sites for clients but now want to make software/tools as a side business. Any idea for me guys?
David Gregorian
@naimur103 If you are so experienced with scraping stuff, maybe you could develop a no-code tool for creating custom scrapers :) Through a SaaS
james smith
We at ejobsitesoftware used to receive many queries for the jobs database. So we have built a custom job scrapper in Laravel using Goutte. Check screenshot - http://cricketu.com/web-scrap/
David Gregorian
@jobboardsoftware That looks pretty cool James! Did you think about publishing it? (Paid or open source)
james smith
@david_gregorian We plan to use it along with Job Board Software - https://www.ejobsitesoftware.com and provide job database to job board owners
12
Next
Last