💫Automate Title Tags Audit with Python

October 26, 2024

When carrying out an SEO audit for a large website, content tagging checks normally take away plenty of time.

Optimizing title tags and meta descriptions are the basics of SEO. In fact, there are many tools and methods already doing the job for you.

Working in-agency, I learned that carrying out an audit may disclose so many different scenarios that could impair your regular auditing process.

Unexpected issues may arise out of the blue and you can’t do anything but learn about them in the interim.

What is more, time is rarely on your side as the list of points to cross-check is often unfathomable before meeting a given deadline.

It’s always so hard to weigh off prioritization and time management when enduring an audit. This is where automation can come in handy.

In this post, I am going to show you a quick way to automate title tag checks by combining the crawls of a third-party tool and Python.

As an example for this tutorial, I will use The Body Shop, a popular British cosmetics marketing skincare, and perfumes.

Table of Contents

What you Need to Get Started

Before kicking off, let me summarize all the bits and pieces required to gear up the framework along with some preliminary considerations.

This model requires:

An Export file from Screaming Frog containing duplicate titles. You can obtain it from the Page Titles tab on the tool > Duplicate > Export

An Export file from Screaming Frog containing short titles. You can obtain it from the Page Titles tab on the tool > Below 30 characters > Export

Using Google Colab or Jupyter Notebook. Here’s a typical example of “substitute products” as both work client-side, meaning they use browsers to process machine learning tasks. Whether you can choose to use one or another, I recommend going with Colab due to its user-friendly interface and low dependency on external libraries.
No coding experience is required.
As the script was pieced together in 10 minutes, you can tell there’s nothing really complicated to get a grip on. Anyway, familiarity with Vlookup and data manipulation is recommended.

Install and Import Libraries

As usual, the first step is to install and import the required libraries for our model.

The only external library we need to call up is brewer2mpl, a Python package for adding colors to our charts.

Next, we only need to import a few libraries from our Colab environment that will allow us to build data frames (Pandas and NumPy) and plot the results (Seaborn and Matplotlib)

To set up matplotlib we are going to use the following comprehensive script

!pip install brewer2mpl
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import random
import warnings; warnings.filterwarnings(action='once')

large = 22; med = 16; small = 12
params = {'axes.titlesize': large,
          'legend.fontsize': med,
          'figure.figsize': (16, 10),
          'axes.labelsize': med,
          'axes.titlesize': med,
          'xtick.labelsize': med,
          'ytick.labelsize': med,
          'figure.titlesize': large}
plt.rcParams.update(params)
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
%matplotlib inline

WARNING

Please note brewer2mpl is now Palettable.
brewer2mpl will no longer be updated, but will remain available for the foreseeable future.

Import Datasets

Next up, we need to use Pandas to create a data frame to store the import file containing duplicate titles.

duplicate = pd.read_excel('/content/TBS duplicate titles.xlsx')
df = pd.DataFrame(duplicate, columns=['Address','Title 1','Title 1 Length']) 
df

Once we have the first dataset, we do the same with the short titles report and create the second one.

short = pd.read_excel('/content/TBS short titles.xlsx')
df2 = pd.DataFrame(short, columns=['Address','Title 1', 'Title 1 Length'])
df2

Merge the Datasets

Here is where the Vlookup comes into play.

Whether most of you are used to performing this operation using Excel, this can be done with Python with excellent results as automation speeds up the process at an invaluable pace.

To use a Vlookup, we need to leverage the df.merge function from Pandas

result = df.merge(df2,  on='Address')
result['Title 1 Length_x'] = result['Title 1 Length_x'].round(0).astype('int64')
result

💡BONUS💡

You can automate plenty of boring SEO tasks with Python.
Why not try to automate an XML sitemap audit or speed up a Core Web Vitals analysis with Python?

Data Cleaning

As you may notice, the current dataset is a proper mess. We’ve got duplicate columns with quirky names, so it’s a sign our data require solid and consistent data cleaning.

To do so, Pandas will help us drop redundant columns and thereby rename them to improve readability.

Finally, we can export the new dataset as a CSV file to process a little bit of data analysis.

result_dropped_multiple = result.drop(['Title 1_y','Title 1 Length_y'], axis=1)
cols = ['URL','Title Link', 'Title Length']
result_dropped_multiple.columns = cols
result_dropped_multiple.to_csv('Duplicate_Short_Titles.csv',index=False)
result_dropped_multiple

Count Duplicate and Short Titles

Now that we’ve cleaned up our data, we could perform as many stats as we would like.

Given we’re looking at identifying duplicate and short title tags at a glance, counting the number of occurrences may provide some preliminary insights.

To do so, we use a couple of Pandas functions that allow us to group up page titles so we can sort them in descending order and issue a count.

x = result_dropped_multiple.groupby('Title Link').size().reset_index(name='counts')
y = x.sort_values(['counts'], ascending=False)
y.to_csv('Count_Duplicate_Short_Titles.csv',index=False)
y.head(20).style.background_gradient()

#you can try with a different gradient to suit your preferences
#y.head(20).style.background_gradient()

💡BONUS💡

There are several way to plot a count of occurences in Python. Within the script, I provided an alternative after the bash (#) to enhance your data visualization accorfing to your preferences.

Plot the Results

Finally, we can plot the results.

Here is where brewer2mpl and Matplotlib come into play and make the script a bit more complicated.

In plain English, we are instructing the machine to grab the first 20 occurrences from the last data frame – y.head(20) – used to parse the count of the number of duplicate and short titles.

In addition, we’re also prompting the environment to colorize the plot via brewer2mpl and to build the factual chart using matplotlib.

At last, we suggest a few data labelings such as a title to the chart and the rotating angle to the data labels on the axis.

n = y.head(20)['Title Link'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
plt.figure(figsize=(18,7), dpi= 80)
plt.bar(y.head(20)['Title Link'], y.head(20)['counts'], color=c, width=.5)
for i, val in enumerate(y.head(20)['counts'].values):
    plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})

# Decoration
plt.gca().set_xticklabels(y.head(20)['Title Link'], rotation=60, horizontalalignment= 'right')
plt.title("Distribution of Duplicate & Short Title Links", fontsize=22)
plt.ylabel('# Title Links')
plt.ylim(0, 45)
plt.show()

“Store Details” wins the prize as the most frequent duplicate and short title tag of The Body Shop.

On the other side, we learn a subtle existence of quirky title tags (“Default Category Page“) which I’d have expected to see within a staging environment.

Conclusion

SEO audits can be extremely daunting.

Python offers unlimited solutions to automate your workflow and this tutorial provided a method to speed up your tasks with high-quality standards.

Simone De Palma

Technical SEO Executive

Simone De Palma is a Technical SEO Executive at iProspect UK and the founder of SEO Depths.

He graduated in Marketing and Management from Università IULM before completing a degree in Digital Marketing and Data Science at Leeds Beckett University.
Simone has worked as an SEO Specialist in digital agencies in Italy and the United Kingdom and he’s a contributor for the Search Engine Land.

When he’s away from his double screens, he enjoys cooling down with a refreshing swim at the pool. You could find him exploring art museums or enjoying the company of a classic romance.

💫 Quick Page Titles Audit with Python