In 2024 is Puppeteer still best for this?
the puppeteer stealth package was deprecated as i read. how "bad" is it now? i dont need perfect stealth detection right now, good stealth detection would be sufficient for me.
is there a similar stealth package for playwright? or is there any up to date stealth package right now in general? i'm looking for the 20% effort 80% result approach right here.
or what would be your general take for medium effort scraping in ndoejs? basically i just need to read some og:images from some websites :) thanks for your answers!
Videos
Hey Everyone, I ran a [rather silly] race between Puppeteer, Playwright and Selenium to see which one would be fastest on a simple scrape.
Far from a comprehensive benchmark, this race is 100% free from advanced configurations, multi-threading or anything complicated. It just opens Wallapop (a second hand marketplace in Spain) and times how long it takes to extract the first 2000 results of a search.
Another thing to note is that I ran this on Google Colab, that throttles resources unpredictably, so take this as it is, just a simple-fun race with lots of questionable decisions.
If you like this simple format, have any ideas on how to improve a race like this or have a strong urge to prove Ward Cunningham wright, let me know in the comments!
(Also, if you think your tool of choice isn't being represented fairly, feel free show how simple code improvements yield more speed with the same resources :)
Hey. I'm in a situation where I have to choose either Puppeteer or Playwright. I'm interested in nothing else but maximum efficiency and stability, knowing that my scripts take hours/days to finish.
Thanks.
Web Scraping is kinda like my hobby, I enjoy it a lot. The question is: Can I turn this hobby into a profession? I already know scrapy, selenium and beautifulsoup. What tools should be familiar with?
Im new to webscraping and i wanted to know which of these i could use to create a database of phone specs and laptop specs, around 10,000-20,000 items.
First started learning BeautifulSoup then came to a roadblock when a load more button needed to be used
Then wanted to check out selenium but heard everyone say it's outdated and even the tutorial i was trying to follow vs what I had to code were completely different due to selenium updates and functions not matching
Now I'm going to learn Playwright because tutorial guy is doing smth similar to what I'm doing
and also I saw some people saying using requests by finding endpoints is the easiest way
Can someone help me out with this?
I want to write a small script to scrape a small business directory in my city. Nothing crazy, it's a single page with filters, not hundreds of individual pages.
I'm looking for a lightweight library. Don't want to download a full Chromium install if I don't have to.
I've looked into Osmosis, Xray, and NoodleJS and none of them are actively maintained (will that even be an issue for use case?).
Are there alternatives to Puppeteer and Cheerio for scraping in 2023 or are they sill the "go-to"? I liked the simplified API of Osmosis and Xray, but yeah like I mentioned they are not maintained.
Thanks for the suggestions!
Quick question for Node.js devs: between Playwright and Puppeteer, which one is less resource intensive in terms of CPU and RAM usage?
Running browser automation on a VPS with limited resources, so performance matters.
Thanks!
Has anyone experimented with Puppeteer? What are some interesting projects or applications that have been developed using it?
I have recently developed a web scraper specifically designed to extract responses from ChatGPT instead of relying on its API. Could you please evaluate the scraper and provide suggestions for improvement?
https://github.com/shobhitexe/GPT-Spider