How to Extract Wayfair Product Using Python & Beautiful Soup?
Here, we will see how to scrape Wayfair products with Python & BeautifulSoup easily and stylishly.
This blog helps you get started on real problem solving whereas keeping that very easy so that you become familiar as well as get real results as quickly as possible.
The initial thing we want is to ensure that we have installed Python 3 and if not just install it before proceeding any further.
After that, you may install BeautifulSoup using
We would also require LXML, library’s requests, as well as soupsieve for fetching data, break that down to the XML, as well as utilize CSS selectors. Then install them with:
When you install it, open the editor as well as type in.
Now go to the listing page of Wayfair products to inspect data we could get.
That is how it will look:
Now, coming back to our code, let’s get the data through pretending that we are the browser like that.
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml')
Then save it as a scrapeWayfair.py.
In case, you run that.
python3 scrapeWayfair.py
You will get the entire HTML page.
Now, it’s time to utilize CSS selectors for getting the required data. To do it, let’s use Chrome as well as open an inspect tool.
We observe that all individual products data are controlled within a class ‘ProductCard-container.’ We could scrape this using CSS selector ‘.ProductCard-container’ very easily. Therefore, let’s see how the code will look like:
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.ProductCard-container'): try: print('----------------------------------------') print(item) except Exception as e: #raise e print('')
It will print the content of all the elements, which hold the product’s data.
Now, we can choose classes within these rows, which have the required data. We observe that a title is within the
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url = 'https://www.wayfair.com/rugs/sb0/area-rugs-c215386.html' response=requests.get(url,headers=headers) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.ProductCard-container'): try: print('----------------------------------------') #print(item) print(item.select('.ProductCard-name')[0].get_text().strip()) print(item.select('.ProductCard-price--listPrice')[0].get_text().strip()) print(item.select('.ProductCard-price')[0].get_text().strip()) print(item.select('.pl-ReviewStars-reviews')[0].get_text().strip()) print(item.select('.pl-VisuallyHidden')[2].get_text().strip()) print(item.select('.pl-FluidImage-image')[0]['src']) except Exception as e: #raise e print('')
In case, you run that, it would print all the information.
And that’s it!! We have done that!
If you wish to utilize this in the production as well as wish to scale it to thousand links, you will discover that you would get the IP blocked very easily with Wayfair. With this scenario, utilizing rotating proxy services for rotating IPs is nearly a must. You may utilize the services including Proxies API for routing your calls using the pool of millions of domestic proxies.
In case, you wish to scale crawling speed as well as don’t wish to set the infrastructure, then you can utilize our Wayfair data crawler to easily scrape thousands of URLs with higher speed from the network of different crawlers. For more information, contact us!
Comments
Post a Comment