How To Scrape Amazon Best Seller Products Using Python and Beautifulsoup?

 


Today, we will learn how to Scrape Amazon Best Seller products using Python and BeautifulSoup in a simple and attractive way.

The objective of this blog is to assist you in starting solving real-world issues while making them as basic as possible so that you can understand and get practical results as soon as possible.

So, initially, we must ensure that Python 3 is installed. If not, then you may just download Python 3 and install it before continuing.

Installation

You can install BeautifulSoup with:

pip3 install beautifulsoup4

To acquire data, break it down to XML, and apply CSS selectors. We will also require library's requests, lxml, and soupsieve. Install them by following these steps:

pip3 install requests soupsieve lxml

After you've installed it, open a text editor and put in:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests

Now visit the Amazon bestseller listing page and check the information that you can get:

Image-1-How-To-Scrape-Amazon-Best-Seller-Products-Using-Python-and-BeautifulSoup
Code

Now, let us get back to our script. Let us try and fetch the information by imagining we have a browser like this:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.amazon.in/gp/bestsellers/garden/ref=zg_bs_nav_0/258-0752277-9771203'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')

You can save this code as scrapeAmazonBS.py.

If you execute the script

python3 scrapeAmazonBS.py

You'll be able to see the entire HTML page.

Let us utilize CSS selectors to get the information we are looking for. To do so, return to Chrome and launch the inspect tool.

Images-2-How-To-Scrape-Amazon-Best-Seller-Products-Using-Python-and-BeautifulSoup

All the individual product information is provided with the class 'zg-item-immersion.' With CSS selector ‘.zg-item-immersion,' we can simply extract this. So, here's how the code appears.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.amazon.in/gp/bestsellers/garden/ref=zg_bs_nav_0/258-0752277-9771203'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('.zg-item-immersion'):
	try:
		print('----------------------------------------')
		print(item)	except Exception as e:
		#raise e
		print('')

This will print all the information for each of the product data elements.

Script1

We may now select subclasses within these rows that hold the data we require.

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'}
url = 'https://www.amazon.in/gp/bestsellers/garden/ref=zg_bs_nav_0/258-0752277-9771203'response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content,'lxml')
for item in soup.select('.zg-item-immersion'):
	try:
		print('----------------------------------------')
		print(item)
		print(item.select('.p13n-sc-truncate')[0].get_text().strip())
		print(item.select('.p13n-sc-price')[0].get_text().strip())
		print(item.select('.a-icon-row i')[0].get_text().strip())
		print(item.select('.a-icon-row a')[1].get_text().strip())
		print(item.select('.a-icon-row a')[1]['href'])
		print(item.select('img')[0]['src'])
	except Exception as e:
		#raise e
		print('')

If you run, it will print the below details:

Script2

If you want to utilize it in reality and expand to hundreds of connections, you'll discover that Amazon quickly blocks your IP address. Using a revolving proxy service to cycle IPs is more or less a necessity in this case. You can route your calls through a network of thousands of residential proxies using a service like Proxies API.

If you want to increase the speed of crawling but would not like to build your own technology, you can contact our experts at 3i Data Scraping to quickly crawl thousands of URLs using Python and BeautifulSoup.

Mention your requirements and ask for a quote!!!

Comments

Popular posts from this blog

How to Extract Walmart Products Data Including Names, Details, Pricing, etc.

How to Extract eBay Data for Original Comic Art Sales Information?

How to Use Amazon Seller Reviews In Getting Business Opportunities From Home?