Scraping Apple App Store Search with Python

This blog post is a step-by-step tutorial for scraping Apple App Store Search results using Python and SerpApi.

ยท

6 min read

What will be scraped

wwbs-apple-app-store-search

๐Ÿ“ŒNote: In this blog post, I will show you how to scrape the Apple App Store search and get exactly the same result as on Apple iMac, because the search results on Mac are completely different from the results on PC. The screenshots below show the difference:

  • Mac results:

mac-results

  • PC results:

pc-results

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.6 seconds (~0.6 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status

Full Code

If you don't need an explanation, have a look at the full code example in the online IDE.

from serpapi import GoogleSearch
import json

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_app_store',    # SerpApi search engine    
    'term': 'image viewer',         # search query
    'device': 'desktop',            # device to get the results
    'country': 'us',                # country for the search
    'lang': 'en-us',                # language for the search
    'disallow_explicit': False,     # disallowing explicit apps
    'num': 20,                      # number of items per page
    'page': 0,                      # pagination
    # 'property': 'developer'       # developer of an app
}

app_store_results = []

while True:
    search = GoogleSearch(params)            # data extraction on the SerpApi backend
    new_page_results = search.get_dict()     # JSON -> Python dict

    app_store_results.extend(new_page_results['organic_results'])

    if 'next' in new_page_results.get('serpapi_pagination', {}):
        params['page'] += 1
    else:
        break

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import json
LibraryPurpose
GoogleSearchto scrape and parse Google results using SerpApi web scraping library.
jsonto convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_app_store',    # SerpApi search engine    
    'term': 'image viewer',         # search query
    'device': 'desktop',            # device to get the results
    'country': 'us',                # country for the search
    'lang': 'en-us',                # language for the search
    'disallow_explicit': False,     # disallowing explicit apps
    'num': 20,                      # number of items per page
    'page': 0,                      # pagination
    # 'property': 'developer'       # developer of an app
}
ParametersExplanation
api_keyParameter defines the SerpApi private key to use. You can find it under your account -> API key.
engineSet parameter to apple_app_store to use the App Store API engine.
termParameter defines the query you want to search. You can use any search term that you would use in a regular App Store search.
deviceParameter defines the device to use to get the results. It can be set to desktop to use a Mac App Store, tablet to use a iPad App Store, or mobile (default) to use a iPhone App Store.
countryParameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.
langParameter defines the language to use for the search. It's a four-letter country code. Head to the Apple Languages for a full list of supported Apple Languages.
disallow_explicitParameter defines the filter for disallowing explicit apps. It defaults to false.
numParameter defines the number of results you want to get per each page. It defaults to 10. Maximum number of results you can get per page is 200.
pageParameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.).
propertyParameter allows to search the property of an app. developer allows searching the developer title of an app ( e.g., property=developer and term=Coffee gives apps with "Coffee" in their developer's name. (Ex: Coffee Inc.)

๐Ÿ“ŒNote: You can also add other API Parameters.

Define the app_store_results list to which the retrieved data will be added:

app_store_results = []

The while loop is created that is needed to extract data from all pages:

while True:
    # data extraction will be here

Then, we create a search object where the data is retrieved from the SerpApi backend. In the new_page_results dictionary we get data from JSON:

search = GoogleSearch(params)            # data extraction on the SerpApi backend
new_page_results = search.get_dict()     # JSON -> Python dict

Adding new data from this page to the app_store_results list:

app_store_results.extend(new_page_results['organic_results'])

# title = new_page_results['organic_results'][0]['title']
# version = new_page_results['organic_results'][0]['version']
# description = new_page_results['organic_results'][0]['description']

๐Ÿ“ŒNote: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['organic_results'][0]. This is the index of a product, which means that we are extracting data from the first product. The new_page_results['organic_results'][1] is from the second product and so on.

After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination dictionary, then the page parameter is incremented by 1. Else, the loop stops.

if 'next' in new_page_results.get('serpapi_pagination', {}):
    params['page'] += 1
else:
    break

After the all data is retrieved, it is output in JSON format:

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))

Output

[
  {
    "position": 1,
    "id": 1507782672,
    "title": "Pixea",
    "bundle_id": "imagetasks.Pixea",
    "version": "2.1",
    "vpp_license": true,
    "age_rating": "4+",
    "release_note": "- New \"Fixed Size and Position\" zoom mode - Fixed a bug causing crash when browsing ZIP-files - Bug fixes and improvements",
    "seller_link": "https://www.imagetasks.com",
    "minimum_os_version": "10.12",
    "description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them. Supported formats: JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives. Export formats: JPEG, JPEG-2000, PNG, TIFF, BMP. Found a bug? Have a suggestion? Please, send it to support@imagetasks.com Follow us on Twitter @imagetasks!",
    "link": "https://apps.apple.com/us/app/pixea/id1507782672?mt=12&uo=4",
    "serpapi_product_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
    "serpapi_reviews_link": "https://serpapi.com/search.json?country=us&engine=apple_reviews&page=1&product_id=1507782672",
    "release_date": "2020-04-20 07:00:00 UTC",
    "price": {
      "type": "Free"
    },
    "rating": [
      {
        "type": "All Times",
        "rating": 0.0,
        "count": 0
      }
    ],
    "genres": [
      {
        "name": "Photo & Video",
        "id": 6008,
        "primary": true
      },
      {
        "name": "Graphics & Design",
        "id": 6027,
        "primary": false
      }
    ],
    "developer": {
      "name": "ImageTasks Inc",
      "id": 450316587,
      "link": "https://apps.apple.com/us/developer/id450316587"
    },
    "size_in_bytes": 7113871,
    "supported_languages": [
      "EN"
    ],
    "screenshots": {
      "general": [
        {
          "link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/800x500bb.jpg",
          "size": "800x500"
        }
      ]
    },
    "logos": [
      {
        "size": "60x60",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/60x60bb.png"
      },
      {
        "size": "512x512",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/512x512bb.png"
      },
      {
        "size": "100x100",
        "link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/100x100bb.png"
      }
    ]
  },
  ... other results
]

Join us on Twitter | YouTube

Add a Feature Request๐Ÿ’ซ or a Bug๐Ÿž

ย