How To Define Search Intent to GSC Queries with Python

The relentless evolution has been lately bred by the release of Core Updates targeting broader areas of a website. In turn, the search engine’s machine-learning algorithms received a more robust empowerment that translated into improving search intent identification.

In this post, I will provide an overview of Search Intent and I’ll take you through an actionable method to define and cluster intents depending on a set of queries retrieved from Google Search Console.


What is Search Intent

As I touched on a while ago, over the years Google has worked hard to improve its algorithm to be able to determine people’s search intent. Google wants to rank pages that best fit the search term someone is using and the search intent behind the search query. 

After all, everyone who does an online search is hoping to find something. But is someone looking for an answer to a question? Are they looking to visit a specific website? Or, are they searching online because they want to buy something? Many of these types of searches are part of the user journey online, but oftentimes they represent different stages depending on the underlying search intent.

💡Search intent is the reason why someone conducts a specific search, thereby the motivation prompting a behaviour.

You can learn more about the traditional breakdown of the Search Intent for SEO in this post from Yoast.

Search Intent 2.0

Breaking away from the popular categorization, Google officially recognizes plenty of Micro-intents counting in-depth value to the customer journey mapping.

💡Micro-Intents are sub-forms of the classic search intents or user intents (transactional, navigational and informational).

Getting a grasp of micro-intents can benefit digital marketers and content strategists to narrow down marketing strategies and devise efficient tactics to ultimately trigger distinct market segments.

Although this new approach to defining search intent is certainly more precise, in the following Python script I’ll refer to the most popular conception of search intent to make the coding tutorial smoother.

Requirements and Assumptions

To kickstart this Python framework, you first need to round up a couple of preliminary tasks.

  1. Obtain a Knowledge Graph API and make sure to keep it safe.
  2. Export your Google Search Console queries in a CSV file

Get the Knowledge Graph API

You can obtain the Knowledge Graph API by connecting to your Google Cloud account.

Because this software relies on a shareware business model, it is available free of charge for the first 90 days for up to $300 in credits after which you may be requested to pay a fee to continue using the service.

To get the API you need to follow a few intimidating steps. However, if you stay put and follow the next steps you’ll likely be all right.

  1. Access your Google Cloud account
  2. Start a project from the main menu that you can name and set up as you wish.  
  3. Once you’ve created a new project, head to the main menu from the left-hand sidebar, hit “API and service” and go for “Credentials“.

4. Hit “Create Credentials” and next “API”.

Now you should be able to copy the API key and keep it safe for later use.

Export Search Console Queries into a CSV file

Before setting up any automation, you need to collect the last queries that have garnered more exposure according to your Google Search Console property.

To do that, I personally find it smart and convenient to leverage a Google spreadsheet add-on. You can install Search Analytics from your spreadsheet to export any Google Search Console data that you need.

In this case, we’re going to extract queries from the last 30 months.

Next thing, you may want to change the column documenting the URL list as “Top Queries” and apply a filter to either the Clicks or the Impressions columns in a bid to get a grip on the queries driving the most potential traffic.

Import the Packages

You don’t have to do anything than copying and paste the following Python packages.

Here’s a quick overview of the most relevant Python libraries that you will use throughout this framework.

Python LibraryFeature
PandasHelps you produce the data frames that we will use to store our findings
NumpyUnlock the ultimate potential to set up a user-friendly visual array of the outputs
nltkGears up the Knowledge Graph API and start off the tokenization processes at the core of NLP processes.
google.colabHandy and beneficial pack to help you save and download the final outputs

Other than that, we need to import a few additional libraries.

import pandas as pd
import requests
import json
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from collections import Counter
%load_ext google.colab.data_table

Upload the Knowledge Graph API and your Queries

It’s time to upload the API key along with your set of queries.

When it comes to importing a file in Python, I rather use the excel format instead of the CSV. This is because I am better off manipulating data in Excel rather than in CSV. However, if you decide to upload your queries in a CSV format, do not forget to use “pd.read_csv” and paste your document’s path into the following brackets (‘ ‘).

apikey= "YOUR_API_KEY"
df = pd.read_excel("/content/QUERY.xlsx") 
total_queries = len(df.index)
query_list = df['Top queries'].tolist()

Setting up Keyword Modifier

It’s time now to set up a Keyword Modifier, or an “intent-specific word” containing a number of magic small words to trigger a specific intent.

Intent-specific words referring to Keyword Modifier would contain a number of magic small words or prompts like the ones listed below:

This is crucial for the purposes of our framework as it enables us to sort the queries from the uploaded data frame by Search Intent.

informative = ['what','who','when','where','which','why','how', 'news', 'fixtures']
transactional = ['buy','order','purchase','cheap','price','tickets','shop','sale','offer']
commercial = ['best','top','review','comparison','compare','vs','versus','ultimate'] 
navigational = ['UCC','UCC Coffee'] # brand name

info_filter = df[df['Top queries'].str.contains('|'.join(informative))]
trans_filter = df[df['Top queries'].str.contains('|'.join(transactional))]
comm_filter = df[df['Top queries'].str.contains('|'.join(commercial))]
navigational_filter = df[df['Top queries'].str.contains('|'.join(navigational))]

info_filter['Intent'] = "Informational"
trans_filter['Intent'] = "Transactional"
comm_filter['Intent'] = "Commercial"
navigational_filter['Intent'] = "navigational"

info_count = len(info_filter)
trans_count = len(trans_filter)
comm_count = len(comm_filter)
navigational_count = len(navigational_filter)

Search Intent Breakdown per Queries

What if you could have a quick peek at the trending search intent in your query file?

The next step is designed to return an outcome displaying an accurate breakdown of the search intent ratio spurring from the audited website.

print("Total: " + str(total_queries))
print("Info: " + str(info_count) + " | " + str(round((info_count/total_queries)*100,1)) + "%")
print("Trans: " + str(trans_count) + " | " + str(round((trans_count/total_queries)*100,1)) + "%")
print("Comm: " + str(comm_count) + " | " + str(round((comm_count/total_queries)*100,1)) + "%")
print("navigational: " + str(navigational_count) + " | " + str(round((navigational_count/total_queries)*100,1)) + "%")

In the last three months, this website has received:

  • 12% of search queries disclosing Transactional Intent
  • 4.6% of search queries disclosing Informational Intent
  • 3.7% of search queries disclosing Commercial Intent

Determine Search Intent per Each Query

Lastly, we are going to leverage once again the Pandas library to complete the set-up of our brand-new data frame. This is supposed to wrap up:

  • Top Queries
  • Clicks
  • Impressions
  • CTR
  • Avg. Positions
  • Intent

And why would you prevent yourself from receiving a beautified version of the output? Let’s use the NumPy package to get a readable version of your Search Console Top Queries report.

df_intents = pd.concat([info_filter,trans_filter,comm_filter,navigational_filter]).sort_values('Clicks', ascending=False)
df_intents = df_intents.drop_duplicates(subset='Top queries', keep="first")
df_intents = df_intents[ ['Top queries'] + ['Clicks'] + ['Impressions'] + ['Intent'] + ['CTR'] + ['Position'] ]
df_intents

You will have returned something similar

You can save the output with the following Pandas lines of code

df.to_excel(r'YOUR_PC_DIRECTORY_PATH\Search_Intent.xlsx', index = False, header=True)

Conclusion

Although there are already a few SEO tools nuking the automated definition of Search Intent, the bright sides to using this method are that it comes for free, is handy to use, and provides a quick peek at what’s behind the surface of your Search Console’s Top Queries.

Obviously, I recommend using this Python framework with a solid grain of salt meaning you may want to use it more as a complementary tool to help inform your own in-depth SEO expertise.

Further Readings

This post was inspired by the existing solid framework devised by Greg Bernhardt in his post Use Python to label Query Intent, Entities and Keyword Count

For further reference, please go check it out and make up your mind about what fresh ideas can help improve the framework