Using Google Reverse Images API from SerpApi

Using Google Reverse Images API from SerpApi

This blog post is about scraping Google Reverse Images results using SerpApi.

What is Google Reverse Images

To put it simply, it helps you quickly discover visually similar images from around the web. Your user can upload a photo from your desktop/mobile, or paste a URL of a photo, and it will show you related images used on other websites and also different sizes of the same photo almost instantly.

For example, you can take (paste URL) a photo of a minecraft pillow and use it to search for info or other similar images.

The results can include:

  • Search results for objects in the image.
  • Similar images.
  • Websites with the image or a similar image.

In this blog post, we'll show how you can utilize SerpApi ability to parse data from reverse images results. We're doing it without browser automation, which is a lot faster.

Example of results that are being returned from Google based on a given image:

blog-google-reverse-image

The typical reverse image process:

illustration of reverse image process

Simple Hello World

from serpapi import GoogleSearch
import os, json

image_url = "https://user-images.githubusercontent.com/81998012/182214192-59dfb3fe-522c-4979-bb42-9f8091dfd9d6.jpg"

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    "api_key": os.getenv("API_KEY")     # your serpapi api key
    "engine": "google_reverse_image",   # SerpApi search engine
    "image_url": image_url,             # image URL to perform a reverse search
    "hl": "en",                         # language of the search
    "gl": "us"                          # country of the search
    # other parameters
}

search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
results = search.get_dict()             # JSON -> Python dictionary

# ["image_results"] is basically a Google organic results
print(json.dumps(results["image_results"], indent=4, ensure_ascii=False))

Detailed Code

from serpapi import GoogleSearch
import os, json

image_urls = [
    "https://user-images.githubusercontent.com/81998012/182214192-59dfb3fe-522c-4979-bb42-9f8091dfd9d6.jpg",
    "https://user-images.githubusercontent.com/81998012/182025185-27df7683-24d5-4747-904b-9f3a6045705b.jpg",
    "https://user-images.githubusercontent.com/81998012/182025195-fec95c5c-aee1-448b-9165-ce9dc1b77a56.jpg",
    "https://user-images.githubusercontent.com/81998012/182027073-4b09a0b7-ec55-415f-bcb0-7a457e87c0b4.jpg",
    "https://user-images.githubusercontent.com/81998012/182025215-ce739965-5c4f-4735-8581-566e03b609f2.jpg",    
]


def main():
    google_reverse_image_data = {}

    for index, image_url in enumerate(image_urls, start=1):
        google_reverse_image_data[f"results for image {index}"] = {}

        params = {
            # https://docs.python.org/3/library/os.html#os.getenv
            "api_key": os.getenv("API_KEY")     # your serpapi api key
            "engine": "google_reverse_image",   # SerpApi search engine
            "image_url": image_url,             # image URL to perform a reverse search
            "location": "Dallas",               # location from where search comes from
            "hl": "en",                         # language of the search
            "gl": "us"                          # country of the search
            # other parameters
        }

        search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
        results = search.get_dict()             # JSON -> Python dictionary

        # some queries may not include this information
        if results["knowledge_graph"]:
            knowledge_graph = {}

            knowledge_graph["title"] = results["knowledge_graph"]["title"]
            knowledge_graph["description"] = results["knowledge_graph"]["description"]

            google_reverse_image_data[f"results for image {index}"]["knowledge_graph"] = knowledge_graph

        # some queries may not include organic results
        if results["image_results"]:
            google_reverse_image_data[f"results for image {index}"]["organic_results"] = []

            for result in results["image_results"]:
                image_results = {}

                image_results["position"] = result["position"]
                image_results["title"] = result["title"]
                image_results["link"] = result["link"]
                image_results["snippet"] = result["snippet"]

                google_reverse_image_data[f"results for image {index}"]["organic_results"].append(image_results)

        # some queries may not include this information
        if results["inline_images"]:
            google_reverse_image_data[f"results for image {index}"]["inline_images"] = []

            for result in results["inline_images"]:
                google_reverse_image_data[f"results for image {index}"]["inline_images"].append({
                    "source": result["source"],
                    "thumbnail": result["thumbnail"]
                })

    return google_reverse_image_data


if __name__ == "__main__":
    print(json.dumps(main(), indent=4, ensure_ascii=False))

Prerequisites

pip install google-search-results

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import os, json
LibraryPurpose
osto return environment variable (SerpApi API key) value.
jsonto convert extracted data to a JSON object.
GoogleSearchto scrape and parse Google results using SerpApi web scraping library.

Then we need to have a list of URLs to search data from (could be anything that is iterable):

image_urls = [
    "https://user-images.githubusercontent.com/81998012/182214192-59dfb3fe-522c-4979-bb42-9f8091dfd9d6.jpg",
    "https://user-images.githubusercontent.com/81998012/182025185-27df7683-24d5-4747-904b-9f3a6045705b.jpg",
    "https://user-images.githubusercontent.com/81998012/182025195-fec95c5c-aee1-448b-9165-ce9dc1b77a56.jpg",
    "https://user-images.githubusercontent.com/81998012/182027073-4b09a0b7-ec55-415f-bcb0-7a457e87c0b4.jpg",
    "https://user-images.githubusercontent.com/81998012/182025215-ce739965-5c4f-4735-8581-566e03b609f2.jpg",    
]

Next, we need to create a main function (optional) and create a temporary dict to store extracted data:

def main():
    google_reverse_image_data = {}

For the next step, we need to iterate over all image_urls and pass its value to "image_url" params key:

for index, image_url in enumerate(image_urls, start=1):
    google_reverse_image_data[f"results for image {index}"] = {}

    params = {
        "engine": "google_reverse_image",   # SerpApi search engine
        "image_url": image_url,             # image URL to perform a reverse search
        "location": "Dallas",               # location from where search comes from
        "hl": "en",                         # language of the search
        "gl": "us",                         # country of the search
        # https://docs.python.org/3/library/os.html#os.getenv
        "api_key": os.getenv("API_KEY"),    # your serpapi api
    }

    search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
    results = search.get_dict()
CodeExplanation
enumerate()to add a counter to an iterable and return it. In this case, it's used to show more explicitly which results belong to which image.

Now we need to check if any specific data we want is being returned to us. In this case, we're only checking knowledge_graph, organic_results (image_results), and inline_images:

if results["knowledge_graph"]:
    knowledge_graph = {}

    knowledge_graph["title"] = results["knowledge_graph"]["title"]
    knowledge_graph["description"] = results["knowledge_graph"]["description"]

    google_reverse_image_data[f"results for image {index}"]["knowledge_graph"] = knowledge_graph
if results["image_results"]:
    google_reverse_image_data[f"results for image {index}"]["organic_results"] = []

    for result in results["image_results"]:
        image_results = {}

        image_results["position"] = result["position"]
        image_results["title"] = result["title"]
        image_results["link"] = result["link"]
        image_results["snippet"] = result["snippet"]

        google_reverse_image_data[f"results for image {index}"]["organic_results"].append(image_results)
if results["inline_images"]:
    google_reverse_image_data[f"results for image {index}"]["inline_images"] = []

    for result in results["inline_images"]:
        google_reverse_image_data[f"results for image {index}"]["inline_images"].append({
            "source": result["source"],
            "thumbnail": result["thumbnail"]
        })

Now we need to return the data:

return google_reverse_image_data

The last step would be to add a Python idiom to make sure that readers understand this code as an executable script and print the data:

if __name__ == "__main__":
    print(json.dumps(main(), indent=4, ensure_ascii=False))

Output:

{
    "results for image 1": {
        "knowledge_graph": {
            "title": "Stairs",
            "description": "Stairs are a structure designed to bridge a large vertical distance by dividing it into smaller vertical distances, called steps. Stairs may be straight, round, or may consist of two or more straight pieces connected at angles. Types of stairs include staircases, ladders, and escalators."
        },
        "organic_results": [
            {
                "position": 1,
                "title": "Stairs - Wikipedia",
                "link": "https://en.wikipedia.org/wiki/Stairs",
                "snippet": "Stairs are a structure designed to bridge a large vertical distance by dividing it into smaller vertical distances, called steps. Stairs may be straight, ..."
            }, ... other organic results
            {
                "position": 4,
                "title": "Foto HD de Claudio Schwarz - Unsplash",
                "link": "https://unsplash.com/es/fotos/ipcsI15th5I",
                "snippet": "Nuevo: Unsplash ahora est谩 disponible en varios idiomas. Puedes volver a cambiar al ingl茅s cuando quieras. Pr贸ximamente habr谩 m谩s idiomas."
            }
        ],
        "inline_images": [
            {
                "source": "https://www.flickr.com/photos/thepiratesgospel/6107309586/",
                "thumbnail": "https://serpapi.com/searches/62e907cb5b54ef5af08d6ff2/images/6886d2b2c5499da05d656e39563b001cc2a1b485150c7a5761ed1190edbccb0f.jpeg"
            }, ... other thumbnails
            {
                "source": "https://en-gb.facebook.com/qualitycarpetsdirect/posts/",
                "thumbnail": "https://serpapi.com/searches/62e907cb5b54ef5af08d6ff2/images/6886d2b2c5499da08e88e75176317a1c08b36d2c2800f2329de12d162eab24e9.jpeg"
            }
        ]
    }, ... other results
    "results for image 5": {
        "knowledge_graph": {
            "title": "Art",
            "description": "Art is a diverse range of human activity, and resulting product, that involves creative or imaginative talent expressive of technical proficiency, beauty, emotional power, or conceptual ideas."
        },
        "organic_results": [
            {
                "position": 1,
                "title": "Art.com | Wall Art: Framed Prints, Canvas Paintings, Posters ...",
                "link": "https://www.art.com/",
                "snippet": "Shop Art.com for the best selection of wall art and photo prints online! Low price guarantee, fast shipping & free returns, and custom framing options ..."
            },
            {
                "position": 2,
                "title": "Art - Wikipedia",
                "link": "https://en.wikipedia.org/wiki/Art",
                "snippet": "Art is a diverse range of human activity, and resulting product, that involves creative or imaginative talent expressive of technical proficiency, beauty, ..."
            }
        ],
        "inline_images": [
            {
                "source": "https://www.leireunzueta.com/journal/tag/summer",
                "thumbnail": "https://serpapi.com/searches/62e907d0e13508b8c60f4c3b/images/6c0c95a05f3c4aa45e83ffe98a6112df67130eb20d484feef2c133f72ab49a3f.jpeg"
            }, ... other thumbnails
            {
                "source": "https://unsplash.com/photos/onMwdrVfMuE",
                "thumbnail": "https://serpapi.com/searches/62e907d0e13508b8c60f4c3b/images/6c0c95a05f3c4aa4deae63325b5a810305c9d32f794fc72c1849753080f116fa.jpeg"
            }
        ]
    }
}

Why using API?

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with very fast response times and without browser automation, which becomes much faster.

Average is ~2.07 seconds based on 20 search queries (on the screenshot 15 search queries are shown):

Responce times from Google Reverse Images API under Dashboard, Your Searches page

Join us on Twitter | YouTube

Add a Feature Request馃挮 or a Bug馃悶