How To Scrape Amazon Best Seller Listing Using Python


Sapnathakkar1076

Uploaded on Jun 30, 2023

Category Business

In this post, we have shown the process of how to scrape Amazons best-seller listing using the python language and How to handle HTML tags with example code.

Category Business

Comments

                     

How To Scrape Amazon Best Seller Listing Using Python

How To Scrape Amazon Best Seller Listing Using Python? If you're planning to sell your newly developed product on Amazon, you'll need to know the path to list your product on the platform to get maximum revenue. It will give you an idea of the potential challenges, how to htmtps://iwtwiwg.iweabdattaescr aptinhg.coem/msocia l-maednia-sdcrap ewr.phphat modifications are required in your product and marketing strategy, etc. In this tutorial, we'll walk through the process of ehttpxs://twrwwa.iwcebdtaitanscrgapin ga.com/scarapez-aomaznon- bbest-eselslerst-r asnkiengsl-dlaeta-arn d-lsitasy-ctom dpetiativet.pahp using python to help you with your market research. What is data scraping from websites? It is an automatic process to extract the desired data from websites in a usable format in large quantities for market research purposes. https://www.iwebdatascraping.com/ecommerce-data-scraping.php What are the reasons for scraping data from a website? Web data scraping is generally used for marketing purposes where businesses extract the data from the websites of their competitors like product/service prices, articles, and top-per- forming product details to plan a better marketing strategy. However, Data extraction is not only limited to marketing. It is also valuable for machine learning, SEhtOtps:,/ /awwnw.iwdeb dmatascarapningy.co mm/amaozonr-secra pferi.pehplds. How to do the Amazon best seller listing scraping? Amazon is an American company that works in online eCommerce business and cloud com- puting across the globe. On this platform, a huge number of enterprises, and brands, sell their products to their customers. For example, Books, Sports gadgets, pet food, etc. https://www.iwebdatascraping.com/scrape-tmall-com-product-data.php Here, best sellers are listed in alphabetical order and can be seen in Amazon's best sellers’ category. This page gives a list of more than 40 categories regrouped. Out of which, we will guide you to scrape amazon bestseller lists in different categories using python web scrap- ing. We Will need to have python libraries Beautiful soup and Requests to crawl, parse and mine the data we want from the page to scrape Amazon best seller listings using python. We will follow the below-given steps for our needs to run hAttps:/m/www.aiwebzdatoascrnapin g.scomc/amrazaon-spcrapeer.prhp using python. Install required python libraries Download and parse the HTML code of the source page using the request and beautifulsoup libraries to known product category URLs. Repeat the above step to obtain each product URL. Scrape Data from each webpage. https://www.iwebdatascraping.com/e-commers-scraper.php https://www.iwebdatascraping.com/e-commers-scraper.php Compile the scraped information from each webpage in a python directory. Save the compiled data in a usable format using the python library pandas. Once the data extraction is done, we'll save all the data in CSV format, as shown below. https://www.iwebdatascraping.com/scrape-taobao-product-data.php You can check the full reference code in the link below. How do we execute the above Code? To run the developed code, you can go to the Run button at the top of the page and select the Run on Binder option. Note: Each category shows 100 products on 2 pages, with 40 categories of each department being shown. But a few pages are not accessible due to captcha issues. Install necessary python libraries to be used Install request and beautiful soup libraries with pip command, respectively since these librar- ies are mandatory for extracting the data from the webpage. Save HTML page source code Download and save the source code of the HTML page using requests and beautiful soup python libraries to extract product categories URLs. https://www.iwebdatascraping.com/taobao-scraper.php To download the page, we'll use the get function from the library requests. Further, we'll define the header string of the user agent so that servers will be compatible with the applica- tion, OS, and requesting version. It is helpful to show that we are not scraping the data. The command requests.get gives the output containing data from a targeted webpage. While the status_code property checks if the output was accurate or not. To check that, we can manu- ally verify the HTTP response code, which lies between 200 and 299, for successful output. We can use this response.text to see the downloaded page content. Adding to it, we can also check the length of the downloaded data using len(response.text). In this example, we've just extracted the first 500 characters using the print command. Now, let's store the extracted file in an HTML extension. with open("bestseller.html","w") as f: f.write(response.text) After sahvttpis:n//wwgw.,iw ewbdateasc racpinag.conm/ wecb-hscraepincg-skerv ictesh.phep fileht tpus://swwiwn.iwgeb dattahscreap inFg.ciolme/m>obiOle-apppse-danta- scmrapineg.nphpu in jupyter and open the best- seller.html file from the files shown on the screen. To know whether the extracted data has been saved or not, there is an option where you can check the file size. If the file size is around 250 kB, the process is successful. However, if the file size is 6.6kB, means the process has failed. Reasons to fail could be security or a capt- cha. This is what we can observe when we open the file. https://www.iwebdatascraping.com/web-scraping-services.php Though this looks like the original web page, it's just a copy of that page, and we can't open any link from the copy. To check and customize the source code of the saved file, click File>Open in Jupyter, then opt for the .html file for this operation from the lists and click on the edit option. https://www.iwebdatascraping.com/scrape-amazon-product-data.php https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php So far, we have just got the data for the first 500 characters. Now, we'll move ahead to parse the web page to check the type using BeautifulSoup. Though this looks like the original web page, it's just a copy of that page, and we can't open any link from the copy. To check and customize the source code of the saved file, click File>Open in Jupyter, then opt for the .html file for this operation from the lists and click on the edit option. After finishing the above process, let's find a list of item topic and categories, their titles and URLs, and store them in the device. ############## find and get different item categories description and Url at any department topics_link = 0 for tag in hearder_link_tags[1:]: # print(tag) topics_link.append({ "title": tag.text.strip(), "htutps:/r/wlw"w.i:we btdahatattspcsrga:/p/win.wgw.fc.oiwmie/nbwdeabtd-assccrar(appin"ingg-as.ecrovm"ic/e)ws.ep[bh-p"scrhapinrg-servfice"s.p]h p}) https://www.iwebdatascraping.com/mobile-apps-data-scraping.php ########store in dictionary table_topics = {k:[ d.get(k) for d in topics_link] for k in set().union(*top- ics_link)} Table_topics {'url': ['https://www.amazon.com/Best-Sellers/zgbs/amazon-devic- es/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Amazon-Launchpad/zgbs/- boost/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Appliances/zgbs/applianc- es/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Appstore-Android/zgbs/mo- bile-apps/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - A r t s - C r a f t s - S e w - ing/zgbs/arts-crafts/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - A u d i b l e - A u d i o - books/zgbs/audible/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Automotive/zgbs/automo- tive/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sel lers-Baby/zgbs/baby-prod- ucts/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.ama- zon.com/Best-Sellers-Beauty/zgbs/beauty/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/best-sel lers-books-Amazon/zgbs/- books/ref=zg_bs_nav_ 0/138-0735230-1420505', Gift Cards Amazon Hom 'https://www.amazon.com/best-sellers-music-albums/zgbs/mu- shttpis:c//w/r0/1ww3.iw eebdafht tt=ps:/8-as0crap/ zwinwggw.c.oiwme_/bwdebabt-asscscrarapp73523i _ningg-sn.ecrovmica/ews.epvbh-psc_raping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php0-1420505', 'https://www.amazon.com/best-sellers-camera-photo/zgbs/pho- to/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w . a m a z o n . c o m / B e s t - S e l l e r s / z g b s / w i r e - less/ref=zg_bs_nav_0/138-0735230- 1420505', 'https://www.amazon.com/Best-Sellers/zgbs/fash- ion/ref=zg_bs_nav_0/138-0735230- 1420505', 'https://www.amazon.com/Best-Sellers-Collectible-Coins/zgbs/- coins/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.ama- zon.com/Best-Sellers-Computers-Accessories/zgbs/pc/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers/zgbs/digital-education- al-resources/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-MP3-Downloads/zgbs/dmu- sic/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Electronics/zgbs/electron- ics/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Entertainment-Collecti- bles/zgbs/entertainment- collectibles/ref=zg_bs_nav_0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - G i f t - C a r d s / z g b s / - gift-cards/ref=zg_bs_nav_0/138- men's Slim- 0735230-1420505', ' h t t p s : // w w w . a m a z o n . c o m / B e s t - S e l l e r s - G r o c e r y - G o u r - met-Food/zgbs/grocery/ref=zg_bs_ nav_0/138-0735230-1420505', 'hthtps:/t/wwtw.piwebdsatas:cra/pin/g.cowm/wewb-scrapwing-se.rviaces.pmhp azon.com/Best-Sellers-Handmade/zgbs/hand-made/httrpse://wwfw=.iwebzdatgascr_apibng.cosm/w_ebn-scraapinvg-se_rvic0es.p/hp138ht-tp s:0//ww7w.iw3ebd5ata2scra3ping0.com-/1mob4ile-a2pps0-dat5a-sc0rap5ing.'p,hp ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - H e a l t h - P e r s o n - al-Care/zgbs/hpc/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Home-Kitch en/zgbs/home-garden/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - I n d u s t r i a l - S c i e n t i - fic/zgbs/industrial/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digi- tal-text/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Kitchen-Dining/zgbs/kitch- en/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Magazines/zgbs/maga- zines/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.ama- z o n . c o m / b e s t - s e l l e r s - m o v - ies-TV-DVD-Blu-ray/zgbs/movies-tv/ref=zg_bs_nav _0/138-0735230-1420505', ' h t t p s : // w w w . a m a z o n . c o m / B e s t - S e l l e r s - H o m e - K i t c h - en/zgbs/home-garden/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - I n d u s t r i a l - S c i e n t i - fic/zgbs/industrial/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Kindle-Store/zgbs/digi- tal-text/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Kitchen-Dining/zgbs/kitch- en/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.amazon.com/Best-Sellers-Magazines/zgbs/maga- zines/ref=zg_bs_nav_0/138- 0735230-1420505', 'https://www.ama- z o n . c o m / b e s t - s e l l e r s - m o v - i _ht etps:/s/ww-wT.iw V-DVD-Blu0/13ebda8htattspc-sra:/p0/winwgw.7c.oiwme3/bwdeab5t-assccra2rappining3g-serv -icres.aphpy/zgbs/movies-tv/ref=zg_bs_nav.co0m/w-eb1-scr4aping2-ser0vice5s.ph0p 5', https://www.iwebdatascraping.com/mobile-apps-data-scraping.php ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - M u s i c a l - I n s t r u - m e n t s / z g b s / m u s i c a l - i n s t r u m e n t s / r e f = z g _ bs_nav_0/138-0735230-1420505', 'https://www.amazon.com/Best-Sell- ers-Office-Products/zgbs/office-products/ref=zg_bs_nav_ Hom 0/138-0735230-1420505', ' h t t p s : // w w w . a m a z o n . c o m / B e s t - S e l l e r s - G a r d e n - O u t - door/zgbs/lawn-garden/ref=zg_bs_nav_ 0/138-0735230-1420505', 'https://www.amazon.com/Best-Sellers-Pet-Supplies/zgbs/pet-sup- plies/ref=zg_bs_nav_0/138- 0735230-1420505', ' h t tps : //www.amazon.com/best -se l le rs -sof tware/zgbs/sof t - ware/ref=zg_bs_nav_0/138-0735230 -1420505', 'https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/spor- ting-goods/ref=zg_bs_nav_ 0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - S p o r t s - C o l l e c t i - bles/zgbs/sports-collectibles/ref=zg_bs__ nav_0/138-0735230-1420505', ' h t t p s : // w w w. a m a z o n . c o m / B e s t - S e l l e r s - H o m e - I m p r o v e - ment/zgbs/hi/ref=zg_bs_nav_0/138- 0735230-1420505', ' h t t p s : / / w w w . a m a z o n . c o m / B e s t - S e l l - e r s - To y s - G a m e s / z g b s / t o y s - a n d - g a m e s / r e f = z g _ b s _ n a v _ 0/138-0735230-1420505', 'https://www.amazon.com/best-sellers-vid- eo-games/zgbs/videogames/ref=zg_bs_nav_0/ 1http3s://w8ww.-iw0ebda7tasc3rapin5g.co2m/w3eb-s0crapi-ng1-serv4ices2.php0505'], 'title': ['Amazon Devices & Accessories','Amazhottpsn://ww wL.iweabdautascnrapincg.chom/wpeb-ascradping'-,se rv'iAces.pphp pliahttnps:/c/wwew.iwseb'd,atascraping.com/mobile-apps-data-scraping.php 'Apps & Games', 'Arts, Crafts & Sewing', GAP Women's Ribbed Tank Top Amazon Essentials Women's Slim- 'Audible Books & Originals', Fit Tank, Pack of 2 $13.50 'Automotive', 'Baby', 'Beauty & Personal Care', 'Books', 'CDs & Vinyl', 'Camera & Photo Products', 'Cell Phones & Accessories', 'Clothing, Shoes & Jewelry', 'Computers & Accessories', 'Collectible Currencies', 'Digital Educational Resources', 'Digital Music', 'Electronics', 'Entertainment Collectibles', 'Gift Cards', 'Grocery & Gourmet Food', 'Handmade Products', 'Health & Household', 'Home & Kitchen', 'Industrial & Scientific', 'Kindle Store', 'Kitchen & Dining', 'Magazine Subscriptions', 'Movies & TV', 'Musical Instruments', 'Office Products', 'Patio, Lawn & Garden', 'Pet Supplies', 'Software', ' 'h S Sttps: p p//ww o ow.iw r reb t tda s shta ttspc sra & C:/p/winwg ow.c O .oiwmle/w ueb-stcradpingoors', Downbldeatasccraptinig-s.becrovmicl/ewse.epbh-psscra'p,ing-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php 'Tools & Home Improvement', 'Toys & Games', 'Video Games']} Here, we've extracted the list of 40 bestseller product categories accurately. Repeat the above step for all the above-obtained categories using respective URLs To avoid access being denied by a captcha on several pages while extracting data; we’ll import the time library to apply the sleeping time of a few seconds. import time import numpy as np We are going to define a function, namely parse_page, to extract and parse every single page link saved from the top ranking best-selling product of any category. def fetch(url): "The function take url and headers to download and parse the page using request.get and BeautifulSoup library it return a parent tag of typeBeautifulSoup object Argument: -url(string): web page url to be downloaded and parse Return: -doc(Beautiful Object): it's the parent tag containing the information that we need parsed from the page"" HEADERS= {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/ 20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+ xml,application/xm- l;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Re- hqttpsu://weww.iswebtdashtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php ":"1"} response = requests.get(url, headers= HEADERS) Top if response.sta- tus_code != 200: print("Status code:", response.status_code) raise Exception("Failed to link to web page + topic_url) " page_content = BeautifulSoup(response.text,"html.parser") doc = page_content.findAll('div', attrs={'class':'a-section a-spacing-none aok-relative'}) return doc def parse_page(table_topics,pageNo): """The function take all topic categories and number of page to parse for each topic as input, applyget request to download each page, the use Beautifulsoup to parse the page. the function output are arti- cle_tags list containing all pages content, t_description list containing correspponding topic or categories then an url list for corre- sponding Url. Argument: -table_topics (dict): dictionary containing topic description and url -pageN- o(int): number of page to parse per topic Return: -article_tags(list): list containing successfully parsed pages content where each index is a Beautifulsoup type -t_description (list): list containing successfully parsed topic description -t_url(list): list containing successfully parsed page topic url -fail_tags(list): list containing pages url that failed first parsing -failed_topic(list): list contaning pages topic description that failed first parsing https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php article_tags,t_description, t_url, fail_tags, failed_topic =[], [], [], [], [] for i in range(0,len (table_topics["url"])): # take the url topic_url = table_topics["url"][i] topics_description = table_topics["title"][i] try: for j in range(1,pageNo+1): ref=topic_url.find("ref") url = topic_url[:ref]+"ref=zg_bs_pg_"+str(j)+"?_en- coding=UTF8&pg="+str(j) time.sleep(10) #use resquest to obtain HMTL page content +str(pageNo)+ doc = fetch(url) if len(doc)==0: print("failed to parse page{}".format(url)) Wo fail_tags.append(url) failed_topic.append(topics_description) else: print("Sucsessfully parse:", url) article_tags.append(doc) t_description.append(topics_description) t_url.append(topic_url) except Exception as e: print(e) return article_tags,t_description,t_url,fail_tags, failed_topic If the parsing fails in the first attempt, we'll have a provision for that too. To achieve this, we will define a function reparse_failed_page and will apply it whenever the first attempt fails. Further, we will keep it in a while loop to repeat until it gives the output. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php def reparse_failed_page(fail_page_url, failed_topic): Registry om Fashion Gift Cards Amazon Hom """The function take topic categories url, and de- scription that failed to be accessible due to captcha in the first parsing process, try to fetch and parse thoses page for a second time. the function return article_tags list containing all pages content, topic_de- scription,topic_url and other pages url and topic that failed to load content again Argument: -fail_page_url(dict): list containing failed first parsing web page url ps-failed_topic(int): list contaning failed first parsing ictionary containing topic description and url Return: -article_tags2(list): list containing successfully parsed pages content where each index is a Beautifulsoup type -t_description (list): list containing successfully parsed topic description -t_url(list): list containing successfully parsed page topic url -fail_p(list): list containing pages url that failed again -fail_t(list): list con- taning pages topic description that failed gain print("check if there is any failed pages, then print number:", len(- fail_page_url)) article_tag2, topic_url, topic_d, fail_p, fail_t = [],0,0,0,0 try: Ama Women's Slim- f thtt ops:/r/w wwi.i wiebndata range(lhttspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarapime.sleep(20pin en(fail_page_url)): ingg-s.ecrovmic/ews.epbh-pscraping-services.php) https://www.iwebdatascraping.com/mobile-apps-data-scraping.php doc = fetch(url) if len(doc)==0: print("page{}failed again".format(fail_page_url[i])) fail_p.append(fail_page_url[i]) fail_t.append(failed_topic[i]) else: article_tag2.append(doc) topic_url.append(fail_page_url[i]) topic_d.ap- pend(failed_topic[i]) except Exception as e: print(e) return article_tag2,topic_d,topic_url,fail_p,fail_t To get the maximum number of pages while parsing, we have defined the function parse to achieve 2 levels of parsing. def parse(table_topics,pageNo): Music """The function take table_topics, and number of page to parse for ecah topic url, the main purpose of this funtion is to realize a double attempt to parse maximum number of pages it can .It's a combination of result getting from first and second parse. Argument -table_topics(dict): dictionary containing topic description and url -pageNo(int): number of page to parse per topic Return: -all_arcticle_tag(list): list containing all successfully parsed pages content where each index is a Beautifulsoup type -all_topics_description (list): list containing all successfully parsed topic description -all_topics_url(list): list containing all successfully parsed page topic url ahttprs:/t/wiwwc.iwlebedahtat_tspcsrta:/p/wianwgw.cg.oiwme/bswdeabt,-asstccrar_appiningdg-s.ecroevmic/ewss.epbh-cpscrarpinigp-sertvicieso.phpn,t_huttpsr://lw,wfw.aiwebidlat_asctraping.cosm/m,ofbiale-aipplse-datda-sc_raptinog.php ic = parse_page(ta- ble_topics,pageNo) if len(fail_tags)!=0: article_tags2,t_description2,t_url2,fail_tags2, failed_topic2 = reparse_- failed_page Ribbed Tank Top Amaz Women's Slim- (fail_tags,failed_topic) all_arcticle_tag = [*article_tags, *article_tags2] all_topics_description = [*t_description,*t_description2] all_topics_url = [*t_url,*t_url2] #return all_arcticle_tag, all_topics_descrip- tion, all_topics_url else: print("successfully parsed all pages") all_arcticle_tag = article_tags all_topics_description =t_description all_topics_url = t_url # return article_tags,t_description,t_url,fail_tags,failed_topic return all_arcticle_tag, all_topics_description,all_topics_url all_arcticle_tag, all_topics_description, all_topics_url = parse(table_top- ics,2) Sucsessfully parse: https://www.amazon.com/Best-Sellers/zgbs/ama- zon-devices/ref=zg_ bs_pg_1?_encoding=UTF8&pg=1 Successfully parse: https://www.ama- zon.com/Best-Sellers/zgbs/amazon-devices/ref=zg_ bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Ama- zon-Launchpad/zgbs/boost/ ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Shttpsu://wcwws.iweebdatsascrsapinfg.cuom/lwleby-scr aping-services.phpparse: https://www.amazon.com/Best-Sellers-Ama-https://www.iwebdatascraping.com/web-scraping-services.phpzon-Launchpad/zgbs/boosth/ttp s:r//wewwf.iw=ebdzatasgcra_pinbg.cosm/m_obpile-gapps_-da2ta-s?cra_pineg.pnhp coding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Applianc- es/zgbs/appliances/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Applianc es/zgbs/appliances/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.ama- zon.com/Best-Sellers-Appstore-Android/zgbs/mobile- apps/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Appstore-Android/zgbs/mobile- pps/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Arts-Crafts-Sewing/zgbs/arts- crafts/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Arts-Crafts-Sewing/zgbs/arts- crafts/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Audible-Audio- books/zgbs/audible/ ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://ww- w.amazon.com/Best-Sellers-Audible-Audiobooks/zgbs/audible/ ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Automo- tive/zgbs/automotive/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Automo- tive/zgbs/automotive/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Baby/zgbs/ba- by-products/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Baby/zgbs/ba- by-products/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Shttpsu://wcwws.iweebdatsascrsapinfg.cuom/lwleby-scr aping-services.phpparse: https://www.amazon.com/Best-Sellers-Beau-https://www.iwebdatascraping.com/web-scraping-services.phpty/zgbs/beauty/ref=zg_ bs_hpttpsg://w_ww1.iwe?bda_tasecranpingc.coom/mdobilie-nappgs-da=ta-sUcrapTing.Fphp8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Beau- ty/zgbs/beauty/ref=zg__ bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/best-sellers-books-Ama- zon/zgbs/books/ref =zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/best-sellers-books-Ama- zon/zgbs/books/ref =zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/best-sellers-music-al- bums/zgbs/music/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/best-sellers-music-al- bums/zgbs/ music/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/best-sellers-camera- photo/zgbs/photo/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/best-sellers-camera-pho- to/zgbs/photo/ref=zg_ bs_pg_2?_encoding=UTF8&pg=2 Amazon Hom Sucsessfully parse: https://www.amazon.com/Best-Sellers/zgbs/wire- less/ref=zg_bs_pg_1?_ encoding=UTF8&pg=1 failed to parse pagehttps://www.amazon.com/Best-Sellers/zgbs/wire- less/ref=zg_bs_pg_2?_ encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers/zgbs/fash- ion/ref=zg_bs_pg_1?_ encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers/zgbs/fash- ihtotps:/n/ww/w.irweebdaftas=crapzing.cgom/_webs_https://www.iwebdabt-ascrapingencoding=UsTcrapFing-se p .co 8rv g mic/w&es.ep _2?_ bh-pscraping-services.phppg=2 https://www.iwebdatascraping.com/mobile-apps-data-scraping.php Sucsessfully parse: https://www.amazon.com/Best-Sellers-Collecti- ble-Coins/zgbs/coins/ref=zg _bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Hand- made/zgbs/handmade/ref=zg _bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.ama- zon.com/Best-Sellers-Handmade/zgbs/handmade/ref=zgazon Hom _bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Health-Per- sonal-Care/zgbs/hpc/ref=zg _bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.ama- zon.com/Best-Sellers-Health-Personal-Care/zgbs/hpc/ref=zg _bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.ama- zon.com/Best-Sellers-Home-Kitchen/zgbs/home-garden/ref=zg _bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Home-Kitch- en/zgbs/home-garden/ref =zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Industrial-Sci- entific/zgbs/industrial/ref =zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Industrial-Sci- entific/zgbs/industrial/ref =zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Kin- dle-Store/zgbs/digital-text/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Kin- dle-Store/zgbs/digital-text/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Kitchen-Din- ing/zgbs/kitchen/ref= zg_bs_pg_1?_encoding=UTF8&pg=1 Shttpsu://wcwws.iweebdatsascrsapinfg.cuom/lwleby-scr aping-sperviaces.prhpse: https://www.amazon.com/Best-Sellers-Kitchen-Din-https://www.iwebdatascraping.com/web-scraping-services.phping/zgbs/kitchen/ref= zg_bshttp_s:/p/wwgw.iw_eb2data?scr_apieng.cnom/cmoobile-dappis-ndatag-sc=rapiUng.phTp F8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Maga- zines/zgbs/magazines/ref=n's Slim- zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Maga- zines/zgbs/magazines/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/best-sellers-mov- ies-TV-DVD-Blu-ray/zgbs/ movies-tv/ref=zg_bs_pg_1?_encoding=UTF8&p- g=1 Sucsessfully parse: https://www.amazon.com/best-sellers-mov- ies-TV-DVD-Blu-ray/zgbs/ movies-tv/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 O Successfully parse: https://www.amazon.com/Best-Sellers-Musical-Instruments/zgbs/ musical-instruments/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessful- ly parse: https://www.amazon.com/Best-Sellers-Musical-Instru- ments/zgbs/ musical-instruments/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Office-Prod- ucts/zgbs/office- products/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Office-Prod- ucts/zgbs/office- products/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Garden-Outdoor/zgbs/lawn- garden/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Garden-Outdoor/zgbs/lawn- garden/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Pet-Supplies/zgbs/pet-sup- pl _httpbs:/ i/eswww s _.iwpe /bdarehgtattspcra f s://w _p1in =g.cozww.iwm ge/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php?_encoding=UTF8&httpps://wgww=.iwe1bdatascraping.com/mobile-apps-data-scraping.php Sucsessfully parse: https://www.amazon.com/Best-Sellers-Pet-Sup- plies/zgbs/pet-supplies/ref= zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.ama- zon.com/best-sellers-software/zgbs/software/ref=zg_bs _pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.ama- zon.com/best-sellers-software/zgbs/software/ref=zg_bs__ pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.ama- zon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting- goods/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting- goods/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Successfully parse: https://www.amazon.com/Best-Sellers-Sports-Col- lectibles/zgbs/sports- collectibles/ref=zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Sports-Collectibles/zgbs/sports- collectibles/ref=zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Home-Im- provement/zgbs/hi/ref =zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sellers-Home-Im- provement/zgbs/hi/ref =zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.am- azon.com/Best-Sellers-Toys-Games/zgbs/toys-and-games/ref =zg_bs_pg_1?_encoding=UTF8&pg=1 Sucsessfully parse: https://www.amazon.com/Best-Sell- ers-Toys-Games/zgbs/toys-and-games/ref =zg_bs_pg_2?_encoding=UTF8&pg=2 Sucsessfully parse: https://www.am- ahttpzs://owwwn.iwe.bdcatasocrapming.com//wbeb-secrapsing-tser-vicses.pehp llers-video-games/zgbs/videogames/ref=zghttps://www.iwebdatascraping.com/web-scraping-services.php_bs_pg_1?_encoding=UTF8&httpps://wgww=.iwe1bdatascraping.com/mobile-apps-data-scraping.php Sucsessfully parse: https://www.amazon.com/best-sellers-vid- eo-games/zgbs/videogames/ref=zg _bs_pg_2?_encoding=UTF8&pg=2 check if there is any failed pages, then print number: 4 pagehttps://ww- w.amazon.com/Best-Sellers/zgbs/wireless/ref=zg_bs_pg_2?_encoding= UTF8&pg=2failed again p a g e h t t p s : // w w w . a m a z o n . c o m / B e s t - S e l l e r s - C o l l e c t i - ble-Coins/zgbs/coins/ref=zg_bs_pg_2?_ encoding=UTF8&pg=2failed again pagehttps://www.amazon.com/Best-Sellers-MP3-Downloads/zgbs/d- music/ref=zg_bs_pg_1?_ encoding=UTF8&pg=1 failed again pagehttps://www.amazon.com/Best-Sellers-MP3-Downloads/zgbs/d- music/ref=zg_bs_pg_2?_ encoding=UTF8&pg=2failed again THere, we have successfully mined 76 pages out of 80 which is good enough. Extract information data from each page. In this part, we have defined a function to scrape the data fields like product description, min and max price of the product, product reviews, image URL, product rating, etc. this data as What is the definition of an HTML element? An HTML element is a tool consisting of child nodes, tag names and attributes, text nodes, and a few other factors. This can be used to scrape data. Further, the elementary can also What are HTML Tags, Child Nodes, and Attributes? The easiest way to know an HTML tag is to understand how a device knows what content to display on the screen, where to show it, and how to show it. Further, differentiating between titles, headings, paragraphs, and images. All this is done using HTML tags on the computer. It is a web-based command to push browsers to take action on something. HhttpTs://Mwww.Liwe bdtataascgrapsing. cohm/awebv-scerap ings-seyrvicnes.tphap xes like HTML, body, head, and div. It is to show their meaning, and define hhttops:w//ww w.tiwhebdata scbraprinog.cowm/web-escrrap inwg-seirvlicle s.pphperforhtmtps:// wawws.iw ebpdaetasrcra ptinhg.ceom /mcoboilem-appsm-dataa-sncradpin g.cphponsidering that these tags also represent keywords. Talking about attributes of HTML like href, src, alt, class, and id, they are modifiers of HTML element type. They are used inside opening tags to manage elements. Child nodes are elements nested in another element. We can use different types of HTML tags and attributes in single-page content. The below picture is on HTML tags and attributes to find the price tag information of the product.manip- ulate HTML.bees extracted using an HTML element from the above-parsed data. To get the product description, we will define get_topic_url_item_descriptuon as shown in the below code. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php After that, we have defined get_item_price to scrape minimum and maximum price of the products. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php Now, we defined get_item_review and get_item_rate functions to get product reviews and rates. Lhttpas:/s/wwtwl.iywe bdwahtattsepcsra:/ p/widnwgw.ce.oiwmfe/bwidenabt-asscecrarapdpining g-s.gecrovmiec/ewst.epbh_-pscirtapeingm-servi_ceus.phrpl to mhttpsi:/n/wwew. iwtehbdaetas cirmapinga.cogm/emo bUile-aRppsL-da toa-sfcr atpihng.ephp product. Thus we have extracted all tube relevant data for the bestseller products and stored them in the usable format. Compile the Scraped data from each page into a Python Library We have designated the function get_info to compile all the product data in a python library. def get_info(article_tags,t_description,t_url): Registry Fashion Gift Cards Amazon Hom Ideas """The function take a list of pages content which each index is a Beautiful element that will be use to find parent tag, list of topic description and topic url then the return a dictionary made of list of each item information data such as: his corresponding topic, the topic url, the item description, minimum price(maximum price if exist), item rating, chttpos://swwwt.iwuebdmahtattspcsra:e/p/winwgrw.c.o iwmre/bwedeabt-asvsccrarapipineingg-s.ecwrovmic/ew,s.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php and item image url Argument: Core -article_tags(list): list containing all pages content where each index is a Beautifulsoup type -t_description (list): list containing topic description -t_url(list): list containing topic url Return:-dictionary (dict): dictionary containing all item information data taken from each parse page topic www topic_description, topics_url, item, item_url = [,0,0,0 minimum_price, maximum_price, rating, costuomer_review = [],0,0,0 for idx in range(0,len (article_tags)): doc = article_tags[idx]#.findAll('div', attrs={'class':'a-section a-spacing-none men's Slime aok-relative'})for d in doc : names,topic_name,topic_url = get_topic_url_item_description (d,t_descrip- tion[idx],t_url[idx]) 34.98 min_price,max_price = get_item_price(d) rate = get_item_rate(d) review = get_item_review(d) 00 See Meurl = get_item_url(d) ####put each item data inside corresponding list item.append(names) topic_description.append(topic_name) topics_url.append(topic_url) minimum_price.append(min_price) maximum_price.append(max_price) rating.append(rate) costuomer_review.append(review) item_url.append(url) return { "Topic": topic_description, "Topic_url": topics_url, " "ht I Mtps t:/e/wiw m nw.iweb _datadsc escriptiohttps://www.iwebdatascraping.comimurapinmg.com_/wepb-scrrapiincg-seervice ns.ph"p : item, "Rating out of 5": rating,/web-scraping-services.php": minimumh_ttpps://wrwiw.ciweebda,ta sRcrapiinbg.cobm/meobidle-a pTps-daatan-scrkapin gT.phop p "Maximum_price": maximum_price, "Review" :costuomer_review, "Item Url" item_url} After a couple of parsing attempts, we got the maximum products data from amazon. Now we will store it in a data frame using pandas Store the information in a CSV file using Pandas Let's begin to store the data in the data frame. dataframe = pd.DataFrame(data) Topic Topic_url Item_description Rating out of 5 Minimum_price Maxi- mum_price Review Item Url Amazon Hom 0 Amazon Devices & Accessories https://www.amazon.com/Best-Sellers/zgbs/amazo... Fire TV Stick 4K streaming device with Alexa V... 4.7 39.90 0.00 615786 https:// imag- es-na.ssl-images-amazon.com/images... 1 Amazon Devices & Accessories https://www.amazon.com/Best-Sell- ers/zgbs/amazo... Fire TV Stick (3rd Gen) with Alexa Voice Remot... 4.7 39.90 0.00 1884 https:// images-na.ssl-images-amazon.com/images... ops Tees & Blouses 2 Amazon Devices & Accessories https://www.ama- zon.com/Best-Sellers/zgbs/amazo... Echo Dot (3rd Gen) - Smart speaker with Alexa ... 4.7 39.90 0.00 1124438 https:// images-na.ssl-images-ama- zon.com/images... 4 Amazon Devices & Accessories https://www.amazon.com/Best-Sell- ers/zgbs/amazo... Amazon Smart Plug, works with Alexa - A Certif... 4.7 24.90 0.00 425111 https:// images-na.ssl-images-amazon.com/images... 3795 Video Games https://www.amazon.com/best-sellers-video-game... PowerA Joy Con Comfort Grips for Nintendo Swit...4.7 9.88 0.00 21960 https://images-na.ssl-images-amazon.com/images... 3796 Video Games https://www.amazon.com/best-sellers-video-game... Super Mario 3D All-Stars (Nintendo Switch) 4.8 58.00 0.00 15109 hhttpts:/t/wpww.isweb:da/htat/tspcsria:/p/mwinwgw.c.oiwmae/bwdgeabt-asscecrarappisningg-s-.ecronvmic/ewsa.epbh-ps.crsapinsg-sler-viciesm.php agehttsps://-waww.miwebdaataszcraopingn.com./cmoboile-mapps-d/atia-mscrapiang.pghpes... 3797 Video Games https://www.amazon.com/best-sellers-video-game... Quest Link Cable 16ft, VOKOO Oculus Quest Link... 4.3 29.90 0.00 3576 https://images-na.ssl-images-amazon.com/images... 3798 Video Games https://www.amazon.com/best-sellers-video-game... RUNMUS K8 Gaming Headset for PS4, Xbox One, PC... 4.5 21.80 0.00 47346 https://images-na.ssl-images-amazon.com/images... 3799 Video Games https://www.amazon.com/best-sellers-video-game... NBA 2K21: 15,000 VC - PS4 [Digital Code] 4.5 4.99 99.99 454 https: //imag- es-na.ssl-images-amazon.com/images... We got the data in about 3800 rows and 8 columns using the process. Now let's save it in a CSV file using pandas. dataframe.to_csv('AmazonBestSeller.csv', index=None) Here is how you can access the file - File>Opent Now, Let's open the CSV file to print the first 5 lines from the data. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php Finally, let's have a look at the product with the highest number of reviews by customers. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php Summary In this post, we have gone through the below process to extract amazon bestseller listing using Python.. Library Installation Parsing Bestseller HTML Page Extracting Info from each page Combining it in data frame Converting and saving it in CSV format. Here is what the CSV format code will look like by the end of the project. https://www.iwebdahtattspcsra:/p/winwgw.c.oiwme/bwdeabt-assccrarappiningg-s.ecrovmic/ews.epbh-pscraping-services.php https://www.iwebdatascraping.com/mobile-apps-data-scraping.php That’s it for this project. If you still are not getting the process, or you have a time limit for your business, iWeb Data scraping can help you with hwttpse://wbww .iwsebcdatrasacrapingi.ncomg/we bs-scerapringv-seircvices.pshp, you are just a single mail away from us. CONTACT US +1 424 2264664 www.iwebdatascraping [email protected] Social Media https://www.facebook.com/iwebdatascraping# https://www.instagram.com/iwebdatascrapingservice/ https://twitter.com/iwebdatascrape https://www.linkedin.com/company/iwebdatascraping/ https://www.youtube.com/@iwebdatascraping https://in.pinterest.com/iwebdatascraping/