Keyword research is the most popular SEO task as well as the most handed over to new starters.
As the gist behind keyword research is a piece of cake for anyone, Junior SEOs take on the honours and underlying burdens of such a task that any experienced SEO wouldnât really look forward to undertaking.
Whether you are a new starter in the industry or a bit of an expert, you may be aware that you can use hundreds of tools to do decent keyword research. Especially as a beginner, you would find yourself spending plenty of time figuring out tips and tricks behind the usage of expensive third-party SEO tools.
What if there was a method to automate keyword research that would save both time and tons of money?
In this post, I am going to take you through a handy process to ease your keyword research fatigue with an automated Python framework.
The script is designed to scrape Google Autosuggest in bulk to deliver unlimited search queries suffixed with a group of search intent modifiers of your choice.
Requirements & Assumptions
To kick start the following Python for SEO framework, you will need to be aware of a few heads up.
- Run the script on Google Colab to prevent your CPU from overloading, in case you use different notebooks.
- Avoid scraping multiple times during a keyword research session the Google Index to prevent your IP address to be blocked by Googleâs firewall.
Install and Import the Packages
To set up the framework you will need to install and import a few Python libraries.
First, you will need to install requests, a Python library that enables you to scrape HTML elements from webpages.
!pip install requests_html
đĄ Do not forget to append an exclamation mark before pip
Next, you need to import a row of Python packages to spot a light on the environment.
import requests
import numpy as np
import urllib
import json
import operator
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession
from urllib.parse import (parse_qsl, urlsplit)
Among the packages listed above, two of them are going to troubleshoot most of the pain points in the framework.
Urllib is a Python library that will help us parse the content scraping from the targeted web pages.
Numpy is a Python library that we are going to leverage to ultimately set up a user-friendly visual array of the outputs.
Connecting with Google Index
You can now set up the technical foundations of Google Index scraping. To do so, you need to trawl up the requests library into a def function
.
def get_source(url):
try:
session = HTMLSession()
response = session.get(url)
return response
except requests.exceptions.RequestException as e:
print(e)
Hence, we make sure the next scraping is backed with a suitable parsing task with Urllib. In the interim, you can make a call to Google Index so you will be able to submit a search query.
def get_results(query):
query = urllib.parse.quote_plus(query)
response = get_source("https://suggestqueries.google.com/complete/search?output=chrome&hl=en&q=" + query)
results = json.loads(response.text)
return results
Define your Search Query
Next, you can finally populate our hand-made search bar with a query of your choice
search_term = "Type your query"
results = get_results(search_term)
results
đĄ Find out how Google may interpret your search query
Formatting the Results
Once the machine has processed the Autosuggest bulky scraping and parsing prompts, we are going to format it so that the output reads as clear as possible.
To spice up the framework, we add a variable named âRelevanceâ based on TF-IDF of each term returned from the scraping. Relevance factors with an automated estimation of a given query depending on the frequency to show up on the search results page.
def format_results(results):
suggestions = []
for index, value in enumerate(results[1]):
suggestion = {'term': value, 'relevance': results[4]['google:suggestrelevance'][index]}
suggestions.append(suggestion)
return suggestions
formatted_results = format_results(results)
formatted_results
Adding Suffixes and Keyword Modifiers
Letâs fill up the model with a few toppings to make cook up the final output.
To make sure you donât miss out on any search query combination from Google Index for a given query, we are going to add a pack of suffixes covering all the letters of the alphabet.
def get_expanded_term_suffixes():
expanded_term_suffixes = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm','n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
return expanded_term_suffixes
Next, we pack up the settings with a bunch of keyword modifiers as a hint of search intent
def get_expanded_term_prefixes():
expanded_term_prefixes = ['what *', 'where *', 'how to *', 'why *','buy*', 'how much*','best *', 'worse *' 'rent*', 'sale*', 'offer*','vs*','or*']
return expanded_term_prefixes
You can toy around with the keyword modifiers at your own convenience depending on the funnel stage where your target content sits.
Expand the Search
At this point, you need to simmer the previous prompts so they merge together.
To do so, we need to call a def function to expand the terms.
def get_expanded_terms(query):
expanded_term_prefixes = get_expanded_term_prefixes()
expanded_term_suffixes = get_expanded_term_suffixes()
terms = []
terms.append(query)
for term in expanded_term_prefixes:
terms.append(term + ' ' + query)
for term in expanded_term_suffixes:
terms.append(query + ' ' + term)
return terms
get_expanded_terms(search_term)
And another one to expand search suggestions.
def get_expanded_suggestions(query):
all_results = []
expanded_terms = get_expanded_terms(query)
for term in expanded_terms:
results = get_results(term)
results = format_results(results)
all_results = all_results + results
all_results = sorted(all_results, key=lambda k: k['relevance'], reverse=True)
return all_results
Final Output
All you have to do now is to execute an additional chunk of code to print the output.
But first, letâs create a data frame and rename its columns by leveraging the Pandas library so you can easily download into a CSV file the full keyword report.
expanded_results = get_expanded_suggestions(search_term)
expanded_results_df = pd.DataFrame(expanded_results)
expanded_results_df.columns = ['Keywords', 'Relevance']
expanded_results_df.to_csv('keywords.csv')
expanded_results_df
This is roughly what you might get:
Adjust the Layout Style of the Dataframe
This is entirely optional but worth a try, given the mess coming from the above raw output.
First, you will need to paste the file path from the saved data frame and then reboot the Pandas data frame with a few CSS indications.
expanded_results_df = pd.read_csv('/content/keywords.csv')
selection = ['Keywords','Relevance']
df = expanded_results_df[selection]
df.head(20).style.set_table_styles(
[{'selector': 'th',
'props': [('background', '#7CAE00'),
('color', 'white'),
('font-family', 'verdana')]},
{'selector': 'td',
'props': [('font-family', 'verdana')]},
{'selector': 'tr:nth-of-type(odd)',
'props': [('background', '#DCDCDC')]},
{'selector': 'tr:nth-of-type(even)',
'props': [('background', 'white')]},
]
).hide_index()
You should now get something like that
Conclusion
Now you can get wind of an unlimited deck of search queries that you can use to make a good impression on your boss and to investigate further to serve your SEO content purposes.
As with everything, the devil is in the details and in SEO the challenge is to spot the low-hanging fruits provided by machine learning models.
However, do take these automation models with a grain of salt as they are usually affected by a number of unpredictable outliers which may ultimately harm your SEO decision-making.
Further Readings
This post was brought to light after trawling through several tutorials about web scraping for SEO purposes. One of the most meaningful and inspiring is credited to Matt Clarke and his post How to identify SEO keywords using Google Autocomplete
In case you need a further reference, please go check the original workaround and see if you can find in-depth gems to tweaking this framework to suit your need.
Related Posts