đŸ€“How To Automate Keyword Research with Google Autosuggest

Reading time: 7 Minutes

Keyword research is the most popular SEO task as well as the most handed over to new starters.

As the gist behind keyword research is a piece of cake for anyone, Junior SEOs take on the honours and underlying burdens of such a task that any experienced SEO wouldn’t really look forward to undertaking.

Whether you are a new starter in the industry or a bit of an expert, you may be aware that you can use hundreds of tools to do decent keyword research. Especially as a beginner, you would find yourself spending plenty of time figuring out tips and tricks behind the usage of expensive third-party SEO tools.

What if there was a method to automate keyword research that would save both time and tons of money?

In this post, I am going to take you through a handy process to ease your keyword research fatigue with an automated Python framework.


Requirements & Assumptions

To kick start the following Python for SEO framework, you will need to be aware of a few heads up.

Install and Import the Packages

To set up the framework you will need to install and import a few Python libraries.

!pip install requests_html

💡 Do not forget to append an exclamation mark before pip

Next, you need to import a row of Python packages to spot a light on the environment.

import requests
import numpy as np
import urllib
import json
import operator
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession
from urllib.parse import (parse_qsl, urlsplit)

Among the packages listed above, two of them are going to troubleshoot most of the pain points in the framework.

Connecting with Google Index

You can now set up the technical foundations of Google Index scraping. To do so, you need to trawl up the requests library into a def function.

def get_source(url):

    try:
        session = HTMLSession()
        response = session.get(url)
        return response
    except requests.exceptions.RequestException as e:
        print(e)

Hence, we make sure the next scraping is backed with a suitable parsing task with Urllib. In the interim, you can make a call to Google Index so you will be able to submit a search query.

def get_results(query):
    query = urllib.parse.quote_plus(query)
    response = get_source("https://suggestqueries.google.com/complete/search?output=chrome&hl=en&q=" + query)
    results = json.loads(response.text)
    return results

Define your Search Query

Next, you can finally populate our hand-made search bar with a query of your choice

search_term = "Type your query"
results = get_results(search_term)
results

Formatting the Results

Once the machine has processed the Autosuggest bulky scraping and parsing prompts, we are going to format it so that the output reads as clear as possible.

To spice up the framework, we add a variable named “Relevance” based on TF-IDF of each term returned from the scraping. Relevance factors with an automated estimation of a given query depending on the frequency to show up on the search results page.

def format_results(results):
    suggestions = []
    for index, value in enumerate(results[1]):
        suggestion = {'term': value, 'relevance': results[4]['google:suggestrelevance'][index]}
        suggestions.append(suggestion)
    return suggestions
formatted_results = format_results(results)
formatted_results

Adding Suffixes and Keyword Modifiers

Let’s fill up the model with a few toppings to make cook up the final output.

To make sure you don’t miss out on any search query combination from Google Index for a given query, we are going to add a pack of suffixes covering all the letters of the alphabet.

def get_expanded_term_suffixes():
    expanded_term_suffixes = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm','n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    return expanded_term_suffixes
def get_expanded_term_prefixes():
    expanded_term_prefixes = ['what *', 'where *', 'how to *', 'why *','buy*', 'how much*','best *', 'worse *'                                   'rent*', 'sale*', 'offer*','vs*','or*']
    return expanded_term_prefixes

You can toy around with the keyword modifiers at your own convenience depending on the funnel stage where your target content sits.

Expand the Search

At this point, you need to simmer the previous prompts so they merge together.

To do so, we need to call a def function to expand the terms.

def get_expanded_terms(query):

    expanded_term_prefixes = get_expanded_term_prefixes()
    expanded_term_suffixes = get_expanded_term_suffixes()   

    terms = []
    terms.append(query)

    for term in expanded_term_prefixes:
        terms.append(term + ' ' + query)

    for term in expanded_term_suffixes:
        terms.append(query + ' ' + term)

    return terms
 
get_expanded_terms(search_term)

And another one to expand search suggestions.

def get_expanded_suggestions(query):

    all_results = []

    expanded_terms = get_expanded_terms(query)
    for term in expanded_terms:
        results = get_results(term)
        results = format_results(results)
        all_results = all_results + results
        all_results = sorted(all_results, key=lambda k: k['relevance'], reverse=True)
        
    return all_results

Final Output

All you have to do now is to execute an additional chunk of code to print the output.

But first, let’s create a data frame and rename its columns by leveraging the Pandas library so you can easily download into a CSV file the full keyword report.

expanded_results = get_expanded_suggestions(search_term)
expanded_results_df = pd.DataFrame(expanded_results)
expanded_results_df.columns = ['Keywords', 'Relevance']
expanded_results_df.to_csv('keywords.csv')
expanded_results_df

This is roughly what you might get:

Adjust the Layout Style of the Dataframe

This is entirely optional but worth a try, given the mess coming from the above raw output.

First, you will need to paste the file path from the saved data frame and then reboot the Pandas data frame with a few CSS indications.

expanded_results_df = pd.read_csv('/content/keywords.csv') 
selection = ['Keywords','Relevance']
df = expanded_results_df[selection]
df.head(20).style.set_table_styles(
[{'selector': 'th',
  'props': [('background', '#7CAE00'), 
            ('color', 'white'),
            ('font-family', 'verdana')]},
 
 {'selector': 'td',
  'props': [('font-family', 'verdana')]},

 {'selector': 'tr:nth-of-type(odd)',
  'props': [('background', '#DCDCDC')]}, 
 
 {'selector': 'tr:nth-of-type(even)',
  'props': [('background', 'white')]},
 
]
).hide_index()

You should now get something like that

Conclusion

As with everything, the devil is in the details and in SEO the challenge is to spot the low-hanging fruits provided by machine learning models.

However, do take these automation models with a grain of salt as they are usually affected by a number of unpredictable outliers which may ultimately harm your SEO decision-making.

Further Readings

This post was brought to light after trawling through several tutorials about web scraping for SEO purposes. One of the most meaningful and inspiring is credited to Matt Clarke and his post How to identify SEO keywords using Google Autocomplete

In case you need a further reference, please go check the original workaround and see if you can find in-depth gems to tweaking this framework to suit your need.


Related Posts