Core Web Vitals Audit and Automation with Python

Reading time: 11 Minutes

Core Web Vitals measure page experience signals to ensure an engaging user experience for search users. Each of the Core Web Vitals represents a distinct facet of the user experience, as they measure Field data reflecting the real-world user experience on a web interface.

Core Web VitalsFeature
Largest Contentful Paint (LCP)Measures how long it takes to load the largest image or block of text in the viewport.
First Input Delay (FID)Measures how long it takes for the browser to respond when a user engages with the page (button click, tap, etc.).
Cumulative Layout Shift (CLS)Measures visual stability to determine whether there is a major shift in the content on-screen while elements are loading.

Core Web Vitals is about all page views. The fewer pages per session, the worse your Core Web Vitals will be (as browser caching might not have worked yet during the first page hit).

Yet, SEOs are often biased by a rife misconception that sampling as many pages as possible will result in robust measurements.

Whether this is not wrong in theory, you’ll learn that this can result in an awful lot of work that’s not even going to move the needle in improving data accuracy.

The truth is outliers (i.e statistical errors generated during the analysis) will always be there to soil your data, whether you sample 10 or 1,000 pages.

Then, how do you measure Core Web Vitals on a large scale when carrying out a speed performance audit?

In this post, I am going to share a method to help you drill your site speed analysis on the right data and automate the process in bulk with a Python script

Core Web Vitals Audit with Page Templates

Instead of getting fixated on bulk measuring Core Web Vitals, you should opt for efficient segmentation of top-traffic driving website sections and identify a list of page templates.

Page Template Feature
Homepage
Your target website’s primary landing page as it typically attracts most of the organic traffic. Users need a frictionless landing on this page, hence this is the first milestone to factor in your page loading audit.
Category listing pages (CLP)

Categories listing categories. Because these pages help users navigate to a specific page, they are often large traffic poles with a high density of images.
Product listing pages (PLP)
Categories listing products. Despite targeting narrower topics, they are often filled with images and clearer search intent. You can nail down the results of the speed audit and compare them against business goals.
Product detail page (PDP) Product detail pages. A web page on an eCommerce site that presents the description of a specific product and represents the last touchpoint before prospects progress on to a purchase.
Details displayed (e.g size, color, price, shipping information, reviews) are often dynamically injected using CSS or JavaScript. This can affect page loading performance.
OtherYou can identify an additional range of page templates beneficial to your business. For instance, large eCommerce sites may stress a preference in looking after their Store Locator or FAQ pages.

I recommend you learn more about the difference between CLP and PLP in this post from Ahrefs disclosing 11 ways to improve eCommerce category pages

Once you have a clear idea of potential page templates to submit to your analysis, the next step is to translate them into action.

Here’s an actionable way to dissect your website sections into measurable page templates.

  • 1️⃣ Head to Google Search Console and overview page template that generates the highest organic traffic share in the last 12 months (apply filters where relevant)

    Google Search Console Performance Overview
  • 2️⃣ Head to Looker (ex Data Studio) and retrieve all the landing pages from your property from the 12 months. To segment landing pages, we need to add a few filters using SQL. Hence, hit Resource > Manage added data sources from the top bar. Then, select your property and hit “Add a Field

    N.B: beware the screenshot is Looker in Italian

    Add resources to Looker dashboard
  • 3️⃣ Create a new field using the following example:

    Add a new field Looker with SQL
  • 4️⃣ Export results into a spreadsheet and sort by Clicks while keeping an eye on CTR (if monitoring a large site, this might help)

    Based on that, browse your page templates and for each section cherry-pick the URL with the highest organic traction.

    Looker export spreadsheet segmentation

💡BONUS

Please note the following method can be succesfully applied to measuring search performance for both landing pages and queries.
It’s not limited to fulfil the purposes of a site speed audit.

Other than segmenting your site by top-traffic driving templates, you need to make sure to factor into your speed audit only valid and healthy URLs.

In recent times, Google has claimed that Core Web Vitals no longer impact Noindexed Pages. This means that only discoverable pages will be considered for inclusion in the CrUx dataset, meaning that pages with the following features are not being considered in the Field data report:

– Pages served with a status code other than 200 (except redirect 301)

– Pages served with a noindex meta tag 

– Pages served with a X-Robots tag: noindex header 

Despite the segmentation method should have already ruled out unhealthy pages, it’s always best to make sure you are factoring in indexable pages returning HTTP 200 status code.

⚠️ WARNING ⚠️

Though noindex pages don’t get their own CrUx data, beware they can still impact the origin

Core Web Vitals Automation with Python

Let’s move on to the fun part now.

Once your page templates are up for analysis, you would normally head to a third-party tool measuring page loading time.

The market is fairly saturated with tools acting as “substitute products“, given the high degree of similarity of their features.

In my own experience, I have always used Page Speed Insights from Google to measure Field data from the CrUX (Chrome User Experience) and I reckon it’s probably the most accurate.

For this reason, in the following Python script, we are going to use the API provided for free by Page Speed Insight (PSI).

Before kicking in, you should be made aware of a couple of premises:

  • No coding experience is required. This script leverages the very basics of Python and Data Science, so you don’t have to worry about it. I promise that the most dreadful complexities derive from processing some data cleaning and perhaps plotting the output, but I’ll handle this for you.
  • Use Google Colab to run the script. No affiliation hustlers here, just my personal recommendation for anyone approaching Data Science and machine learning with Python. Colab is probably the most intuitive and comprehensive notebook to run scripts making your coding journey seamless and easy to fulfill.
  • Use a Page Speed Insights API key. In order to run the framework, it is pivotal to call up an API that can easily be obtained from the Google official documentation page

💡BONUS

With Python you can process massive automation tasks for your SEO.
You can learn how to audit structured data in bulk and how to automate an XML sitemap audit with this coding language.

Install and import libraries

To start off the framework, we need to install and import some libraries.

ecommercetoolsA data science toolkit for those working in technical eCommerce, marketing science, and technical SEO and includes a wide range of features to aid analysis and model building.
matplotlib Probably the best Python library for data visualization, it allows you to create static, animated, and interactive charts in Python

To set up matplotlib , ecommercetools and import the required libraries, we are going to use the following comprehensive script

%%capture
!pip install ecommercetools brewer2mpl plotly

import plotly.express as px
import numpy as np
import pandas as pd
import random
import warnings; warnings.filterwarnings(action='once')

WARNING

Please note brewer2mpl is now Palettable.
brewer2mpl will no longer be updated, but will remain available for the foreseeable future.

Ecommercetools and the Page Speed Insights API

At this stage, we’re going to use ecommercetools to call up the Page Speed Insights API and scrape the Core Web Vitals scores in bulk.

Once you have incorporated the API within the”YOUR_KEY", the seo.get_core_web_vitals function from ecommercetools will automate the analysis for the set of URLs that you’ll provide.

For demonstration purposes, I’ve dropped the list of page templates that we extracted before. You can replace these lines and paste your URLs.

After that, we can rename the columns so they appear more user-friendly and print the result


#@title Site Speed Audit with PSI API
from ecommercetools import seo

pagespeed_insights_key = "YOUR_KEY"
urls = ['https://seodepths.com/',
'https://seodepths.com/python-for-seo/sitemap-audit-python/',
'https://seodepths.com/seo-news/structured-data-semantic-search/',
'https://seodepths.com/seo-research/google-pros-cons-annotations/']

df = seo.get_core_web_vitals(pagespeed_insights_key, urls)
cols = ['URL', 'Fetch Time',	'Device',	'Score',	'Speed Index', 'First Meaningful Paint','First Contentful Paint','Time to Interactive', 'Total Blocking Time',	'Cumulative Layout Shift']
df.columns = cols

df.to_excel('test.xlsx',index=False)

df
PSI report output

⚠️WARNING⚠️

Screenshots contain clients’ data. Please note that due to privacy purposes the URL columns will be obfuscated from now on.

Data Cleaning

Next up, we need to process a bit of data cleaning to make the results keen to our reading.

Once we’ve imported back the downloaded file, we need to convert the Score variable into integers to avoid having values with unnecessary decimals.

Then we’re going to drop redundant columns as they don’t serve the purpose of this analysis. As a result, we can export our dataset as an Excel spreadsheet to review.

#import the downloaded file
test = pd.read_excel('test.xlsx')

#convert metrics
test['Score'] = test['Score'].round(0).astype(int)
test['Total Blocking Time'] = test['Total Blocking Time'].astype(int) * 1000 
test['Cumulative Layout Shift'] = test['Cumulative Layout Shift'] / 1000  

#drop unnecessary columns
test_dropped_multiple = test.drop(['Fetch Time', 'Speed Index',	'First Meaningful Paint', 'First Contentful Paint','Time to Interactive'], axis=1)

#save in a dataframe
test_dropped_multiple.to_excel('PSI.xlsx',index=False)
test_dropped_multiple
data cleaning

As you noticed, the ecommercetools library featuring the Page Speed Insights API returns Core Web Vitals scores for both mobile and desktop but fails to return LCP hardest Core Web Vitals to pass.

However, you can always manually measure LCP by jumping on Page Speed Insights.

Coming back to the measurement of Core Web Vitals scores, this time we are going to drop all columns except the one with the scores. This step will be key to pass on to the next phase which includes the plotting.

test_dropped_multiple = test.drop(['Fetch Time', 'Speed Index',	'First Meaningful Paint', 'First Contentful Paint','Time to Interactive','Total Blocking Time', 'Cumulative Layout Shift'], axis=1)

test_dropped_multiple

Plot Core Web Vitals Scores

To plot our Core Web Vitals audit, we’re going to use the basics of Pandas.

First, we manually create a data frame reporting the data from the scraping.

Next, we can use the same library to plot an easy bar chart by rotating the variables in a way that they fit in the final output.

df = pd.DataFrame([['Homepage', 23,61],
                   ['Collection', 23, 52],
                   ['FAQ', 52, 40],
                   ['Popular PDP', 27, 55],
                   ['Popular CLP', 23, 77],
                   ['Popular PLP',24,70],
                   ['Store Locator', 25, 64],
                   ['Popular Store Locator', 23, 63]],
                  columns=['URL', 'mobile', 'desktop'])
df.plot(x='URL',
        kind='bar',
        stacked=False,
        figsize=(14,10),
        rot=25,
        title='Core Web Vitals Score by Device')

As a result, here’s your clear-cut bar chart

Core Web Vitals score plot with Pandas Python

As we can see, the real user experience from the desktop visibly outperforms the mobile experience for the audited set of page templates.

Interestingly, the FAQ template stands out as the most user-friendly page on mobile and even performs better than its desktop counterpart.

Here is the proofread version:

As we can see, the real user experience from desktop visibly outperforms the mobile experience for the audited set of page templates.

Interestingly, the FAQ template stands out as the most user-friendly page on mobile and even performs better than its desktop counterpart.

This piece of data visualization can be extremely helpful in gaining insights quickly. Data accuracy can be affected by inflated biases, fostered by heuristics or genuine ignorance in Data Science.

Conclusions

The key to productivity in SEO is effective prioritization. Learning where to allocate your time and effort can help avoid getting lost in too much data.

Segmenting pages with an efficient method is essential to manage time and achieving an acceptable degree of accuracy in your audits.