How to Scrape Home Depot Product Data with SerpApi
This blog post is a step-by-step guide to scraping The Home Depot product results with SerpApi using Python.
Intro
In this blog post, we'll go through the process of extracting product data from The Home Depot using The Home Depot Product API and the Python programming language.
In order to successfully extract The Home Depot Product results, you will need to pass the product_id
parameter, this parameter is responsible for a specific product. You can extract this parameter from search results. Have a look at the Integrate The Home Depot Search Page Results Data with SerpApi and Python blog post, in which I described in detail how to extract all the needed data.
You can look at the complete code in the online IDE (Replit).
If you prefer video format, we have a dedicated video that shows how to do that: The Home Depot Product API - SerpApi.
What will be scraped
Why using API?
There're a couple of reasons that may use API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
This code retrieves all the data for each of the 24 products on the 1st page:
from serpapi import GoogleSearch
import json
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'home_depot', # SerpApi search engine
'q': 'coffee maker', # query
}
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
product_ids = [result['product_id'] for result in results['products']]
home_depot_products = []
for product_id in product_ids:
product_params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'home_depot_product', # SerpApi search engine
'product_id': product_id, # HomeDepot ID of a product
}
product_search = GoogleSearch(product_params)
product_results = product_search.get_dict()
home_depot_products.append(product_results['product_results'])
print(json.dumps(home_depot_products, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
import json
Library | Purpose |
GoogleSearch | to scrape and parse Google results using SerpApi web scraping library. |
json | to convert extracted data to a JSON object. |
At the beginning of the code, you need to make the request in order to get search results. Then the product_id
will be extracted from them.
The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'home_depot', # SerpApi search engine
'q': 'coffee maker', # query
}
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # data extraction on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
At the moment, the first 24 search results from 1st page are stored in the results
dictionary. If you are interested in all search results with pagination, then check out the Using The Home Depot Product API from SerpApi blog post.
The product_ids
list stores product_id
which are extracted from each search result. These data will be needed later:
product_ids = [result['product_id'] for result in results['products']]
Declaring the home_depot_products
list where the extracted data will be added:
home_depot_products = []
Next, you need to access each product page separately by iterating the product_ids
list:
for product_id in product_ids:
# data extraction will be here
These parameters are defined for generating the URL about the product. If you want to pass other parameters to the URL, you can do so using the product_params
dictionary:
product_params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'home_depot_product', # SerpApi search engine
'product_id': product_id, # HomeDepot ID of a product
}
Parameters | Explanation |
api_key | Parameter defines the SerpApi private key to use. You can find it under your account -> API key |
engine | Set parameter to home_depot_product to use the The Home Depot Product API engine. |
product_id | HomeDepot identifier of a product |
๐Note: You can also add other API Parameters.
Then, we create a product_search
object where the data is retrieved from the SerpApi backend. In the product_results
dictionary we get a new package of the data in JSON format:
product_search = GoogleSearch(product_params)
product_results = product_search.get_dict()
Adding data about the current product to the home_depot_products
list:
home_depot_products.append(product_results['product_results'])
# title = product_results['product_results']['title']
# description = product_results['product_results']['description']
# rating = product_results['product_results']['rating']
# reviews = product_results['product_results']['reviews']
# price = product_results['product_results']['price']
๐Note: In the comments above, I showed how to extract specific fields from the current product.
After the all data is retrieved, it is output in JSON format:
print(json.dumps(home_depot_products, indent=2, ensure_ascii=False))
Output
[
{
"product_id": "206667220",
"title": "12-Cup Programmable Stainless Steel Drip Coffee Maker with Thermal Carafe",
"description": "Get your fix throughout the day with the BLACK+DECKER CM2035B 12-Cup Thermal Coffeemaker. The stainless steel thermal carafe is vacuum-sealed to ensure your coffee stays at the optimal drinking temperature for hours and the Perfect Pour spout does away with spills and drips. The easy-to-use digital controls include a setting for batches of 1-4 cups BLACK+DECKER and the BLACK+DECKER logo are trademarks of The Black and Decker Corporation and are used under license. Cup equals approximately 5 oz. (varies by brewing technique).",
"link": "https://www.homedepot.com/p/BLACK-DECKER-12-Cup-Programmable-Stainless-Steel-Drip-Coffee-Maker-with-Thermal-Carafe-CM2035B/206667220",
"upc": "050875812123",
"model_number": "CM2035B",
"favorite": 103,
"rating": "3.1793",
"reviews": "569",
"price": 69.73,
"highlights": [
"Electric drip-type coffee maker for creating delectable coffee",
"Serves up to 12 cups with ease",
"Includes an Evenstream shower head for maximum flavor extraction",
"Made from stainless steel for high longevity and durability",
"Provides a flavorful pot of hot coffee"
],
"brand": {
"name": "BLACK+DECKER",
"link": "https://www.homedepot.com/b/Appliances-Small-Kitchen-Appliances-Coffee-Makers-Drip-Coffee-Makers/BLACK-DECKER/N-5yc1vZ2fkp8ffZe7c"
},
"images": [
[
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_65.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_100.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_145.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_300.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_400.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_600.jpg",
"https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_1000.jpg"
],
... other images
],
"bullets": [
"12-cup thermal carafe - the large capacity carafe is double-walled and vacuum-sealed to keep your coffee at optimal drinking temperature for hours",
"Customizable brewing options - drink your favorite coffee every morning using features like the brew strength selector and the option for small-batch (1-4 cup) brewing that maintains all the flavor of a full brew",
"even stream showerhead - the Evenstream Showerhead dispenses water evenly over the packed coffee, extracting maximum flavor and wasting less",
"No-drip perfect pour spout - don't put up with annoying spills, the carafe spout is designed to prevent spills and drips while pouring",
"Wide-mouth carafe opening-the carafe is designed with a wide opening for fast, easy cleanup with a damp towel",
"<a href=https://www.homedepot.com/c/electronics_recycling_programs style=color:#F96302; target=_blank>Click here for more information on Electronic Recycling Programs</a>"
],
"info_and_guides": [
{
"title": "Warranty",
"link": "https://images.thdstatic.com/catalog/pdfImages/de/deb8e49b-76c5-4dd2-82e0-6863ebb8408c.pdf"
},
{
"title": "Use and Care Manual",
"link": "https://images.thdstatic.com/catalog/pdfImages/68/681b462d-4233-4d43-ab88-ce5dc12ff487.pdf"
}
],
"specifications": [
{
"key": "Details",
"value": [
{
"name": "Appliance Type",
"value": "Coffee Maker"
},
... other results
]
},
{
"key": "Warranty / Certifications",
"value": [
{
"name": "Certifications and Listings",
"value": "ETL Listed"
},
... other results
]
},
{
"key": "Dimensions",
"value": [
{
"name": "Product Depth (in.)",
"value": "8.58"
},
... other results
]
}
]
},
... other products
]
๐Note: Head to the playground for a live and interactive demo.
Links
Add a Feature Request๐ซ or a Bug๐