Core Web Vitals measure page experience signals to ensure an engaging user experience for search users. Each of the Core Web Vitals represents a distinct facet of the user experience, as they measure Field data reflecting the real-world user experience on a web interface.
Core Web Vitals | Feature |
---|---|
Largest Contentful Paint (LCP) | Measures how long it takes to load the largest image or block of text in the viewport. |
First Input Delay (FID) | Measures how long it takes for the browser to respond when a user engages with the page (button click, tap, etc.). |
Cumulative Layout Shift (CLS) | Measures visual stability to determine whether there is a major shift in the content on-screen while elements are loading. |
Core Web Vitals is about all page views. The fewer pages per session, the worse your Core Web Vitals will be (as browser caching might not have worked yet during the first page hit).
Yet, SEOs are often biased by a rife misconception that sampling as many pages as possible will result in robust measurements.
Whether this is not wrong in theory, youโll learn that this can result in an awful lot of work that’s not even going to move the needle in improving data accuracy.
The truth is outliers (i.e statistical errors generated during the analysis) will always be there to soil your data, whether you sample 10 or 1,000 pages.
Accuracy is overrated.
โ Marco Giordano (@GiordMarco96) November 23, 2022
You don’t need accurate search volumes.
Magnitudes are enough. Is it in the order of a thousand or more?
Knowing the actual value doesn’t make any difference.
And yet, the SEO industry has to yearly ask the same questions.
None of it really matters. https://t.co/f2DF4DIkYO
Then, how do you measure Core Web Vitals on a large scale when carrying out a speed performance audit?
In this post, I am going to share a method to help you drill your site speed analysis on the right data and automate the process in bulk with a Python script
Core Web Vitals Audit with Page Templates
Instead of getting fixated on bulk measuring Core Web Vitals, you should opt for efficient segmentation of top-traffic driving website sections and identify a list of page templates.
Page Template | Feature |
---|---|
Homepage | Your target website’s primary landing page as it typically attracts most of the organic traffic. Users need a frictionless landing on this page, hence this is the first milestone to factor in your page loading audit. |
Category listing pages (CLP) | Categories listing categories. Because these pages help users navigate to a specific page, they are often large traffic poles with a high density of images. |
Product listing pages (PLP) | Categories listing products. Despite targeting narrower topics, they are often filled with images and clearer search intent. You can nail down the results of the speed audit and compare them against business goals. |
Product detail page (PDP) | Product detail pages. A web page on an eCommerce site that presents the description of a specific product and represents the last touchpoint before prospects progress on to a purchase. Details displayed (e.g size, color, price, shipping information, reviews) are often dynamically injected using CSS or JavaScript. This can affect page loading performance. |
Other | You can identify an additional range of page templates beneficial to your business. For instance, large eCommerce sites may stress a preference in looking after their Store Locator or FAQ pages. |
I recommend you learn more about the difference between CLP and PLP in this post from Ahrefs disclosing 11 ways to improve eCommerce category pages
Once you have a clear idea of potential page templates to submit to your analysis, the next step is to translate them into action.
Here’s an actionable way to dissect your website sections into measurable page templates.
1๏ธโฃ Head to Google Search Console and overview page template that generates the highest organic traffic share in the last 12 months (apply filters where relevant)
2๏ธโฃ Head to Looker (ex Data Studio) and retrieve all the landing pages from your property from the 12 months. To segment landing pages, we need to add a few filters using SQL. Hence, hit Resource > Manage added data sources from the top bar. Then, select your property and hit “Add a Field“
N.B: beware the screenshot is Looker in Italian
3๏ธโฃ Create a new field using the following example:
4๏ธโฃ Export results into a spreadsheet and sort by Clicks while keeping an eye on CTR (if monitoring a large site, this might help)
Based on that, browse your page templates and for each section cherry-pick the URL with the highest organic traction.
๐กBONUS
Please note the following method can be succesfully applied to measuring search performance for both landing pages and queries.
It’s not limited to fulfil the purposes of a site speed audit.
Other than segmenting your site by top-traffic driving templates, you need to make sure to factor into your speed audit only valid and healthy URLs.
In recent times, Google has claimed that Core Web Vitals no longer impact Noindexed Pages. This means that only discoverable pages will be considered for inclusion in the CrUx dataset, meaning that pages with the following features are not being considered in the Field data report:
– Pages served with a status code other than 200 (except redirect 301)
– Pages served with a noindex meta tag
– Pages served with a X-Robots tag: noindex header
Google’s core web vitals no longer looks at scores of pages with noindex https://t.co/CVox5V2AXb via @MikeBlazerX pic.twitter.com/pkzXcoUnHi
โ Barry Schwartz (@rustybrick) July 29, 2022
Despite the segmentation method should have already ruled out unhealthy pages, it’s always best to make sure you are factoring in indexable pages returning HTTP 200 status code.
โ ๏ธ WARNING โ ๏ธ
Though noindex pages donโt get their own CrUx data, beware they can still impact the origin
Core Web Vitals Automation with Python
Let’s move on to the fun part now.
Once your page templates are up for analysis, you would normally head to a third-party tool measuring page loading time.
The market is fairly saturated with tools acting as “substitute products“, given the high degree of similarity of their features.
In my own experience, I have always used Page Speed Insights from Google to measure Field data from the CrUX (Chrome User Experience) and I reckon it’s probably the most accurate.
For this reason, in the following Python script, we are going to use the API provided for free by Page Speed Insight (PSI).
Before kicking in, you should be made aware of a couple of premises:
- No coding experience is required. This script leverages the very basics of Python and Data Science, so you don’t have to worry about it. I promise that the most dreadful complexities derive from processing some data cleaning and perhaps plotting the output, but I’ll handle this for you.
- Use Google Colab to run the script. No affiliation hustlers here, just my personal recommendation for anyone approaching Data Science and machine learning with Python. Colab is probably the most intuitive and comprehensive notebook to run scripts making your coding journey seamless and easy to fulfill.
- Use a Page Speed Insights API key. In order to run the framework, it is pivotal to call up an API that can easily be obtained from the Google official documentation page
๐กBONUS
With Python you can process massive automation tasks for your SEO.
You can learn how to audit structured data in bulk and how to automate an XML sitemap audit with this coding language.
Install and import libraries
To start off the framework, we need to install and import some libraries.
ecommercetools | A data science toolkit for those working in technical eCommerce, marketing science, and technical SEO and includes a wide range of features to aid analysis and model building. |
matplotlib | Probably the best Python library for data visualization, it allows you to create static, animated, and interactive charts in Python |
To set up matplotlib , ecommercetools and import the required libraries, we are going to use the following comprehensive script
%%capture
!pip install ecommercetools brewer2mpl plotly
import plotly.express as px
import numpy as np
import pandas as pd
import random
import warnings; warnings.filterwarnings(action='once')
WARNING
Please note brewer2mpl is now Palettable.
brewer2mpl will no longer be updated, but will remain available for the foreseeable future.
Ecommercetools and the Page Speed Insights API
At this stage, we’re going to use ecommercetools to call up the Page Speed Insights API and scrape the Core Web Vitals scores in bulk.
Once you have incorporated the API within the”YOUR_KEY"
, the seo.get_core_web_vitals
function from ecommercetools will automate the analysis for the set of URLs that you’ll provide.
For demonstration purposes, I’ve dropped the list of page templates that we extracted before. You can replace these lines and paste your URLs.
After that, we can rename the columns so they appear more user-friendly and print the result
#@title Site Speed Audit with PSI API
from ecommercetools import seo
pagespeed_insights_key = "YOUR_KEY"
urls = ['https://seodepths.com/',
'https://seodepths.com/python-for-seo/sitemap-audit-python/',
'https://seodepths.com/seo-news/structured-data-semantic-search/',
'https://seodepths.com/seo-research/google-pros-cons-annotations/']
df = seo.get_core_web_vitals(pagespeed_insights_key, urls)
cols = ['URL', 'Fetch Time', 'Device', 'Score', 'Speed Index', 'First Meaningful Paint','First Contentful Paint','Time to Interactive', 'Total Blocking Time', 'Cumulative Layout Shift']
df.columns = cols
df.to_excel('test.xlsx',index=False)
df
โ ๏ธWARNINGโ ๏ธ
Screenshots contain clients’ data. Please note that due to privacy purposes the URL columns will be obfuscated from now on.
Data Cleaning
Next up, we need to process a bit of data cleaning to make the results keen to our reading.
Once we’ve imported back the downloaded file, we need to convert the Score
variable into integers to avoid having values with unnecessary decimals.
Then we’re going to drop redundant columns as they don’t serve the purpose of this analysis. As a result, we can export our dataset as an Excel spreadsheet to review.
#import the downloaded file
test = pd.read_excel('test.xlsx')
#convert metrics
test['Score'] = test['Score'].round(0).astype(int)
test['Total Blocking Time'] = test['Total Blocking Time'].astype(int) * 1000
test['Cumulative Layout Shift'] = test['Cumulative Layout Shift'] / 1000
#drop unnecessary columns
test_dropped_multiple = test.drop(['Fetch Time', 'Speed Index', 'First Meaningful Paint', 'First Contentful Paint','Time to Interactive'], axis=1)
#save in a dataframe
test_dropped_multiple.to_excel('PSI.xlsx',index=False)
test_dropped_multiple
As you noticed, the ecommercetools library featuring the Page Speed Insights API returns Core Web Vitals scores for both mobile and desktop but fails to return LCP hardest Core Web Vitals to pass.
However, you can always manually measure LCP by jumping on Page Speed Insights.
Coming back to the measurement of Core Web Vitals scores, this time we are going to drop all columns except the one with the scores. This step will be key to pass on to the next phase which includes the plotting.
test_dropped_multiple = test.drop(['Fetch Time', 'Speed Index', 'First Meaningful Paint', 'First Contentful Paint','Time to Interactive','Total Blocking Time', 'Cumulative Layout Shift'], axis=1)
test_dropped_multiple
Plot Core Web Vitals Scores
To plot our Core Web Vitals audit, we’re going to use the basics of Pandas.
First, we manually create a data frame reporting the data from the scraping.
Next, we can use the same library to plot an easy bar chart by rotating the variables in a way that they fit in the final output.
df = pd.DataFrame([['Homepage', 23,61],
['Collection', 23, 52],
['FAQ', 52, 40],
['Popular PDP', 27, 55],
['Popular CLP', 23, 77],
['Popular PLP',24,70],
['Store Locator', 25, 64],
['Popular Store Locator', 23, 63]],
columns=['URL', 'mobile', 'desktop'])
df.plot(x='URL',
kind='bar',
stacked=False,
figsize=(14,10),
rot=25,
title='Core Web Vitals Score by Device')
As a result, here’s your clear-cut bar chart
As we can see, the real user experience from the desktop visibly outperforms the mobile experience for the audited set of page templates.
Interestingly, the FAQ template stands out as the most user-friendly page on mobile and even performs better than its desktop counterpart.
Here is the proofread version:
As we can see, the real user experience from desktop visibly outperforms the mobile experience for the audited set of page templates.
Interestingly, the FAQ template stands out as the most user-friendly page on mobile and even performs better than its desktop counterpart.
This piece of data visualization can be extremely helpful in gaining insights quickly. Data accuracy can be affected by inflated biases, fostered by heuristics or genuine ignorance in Data Science.
Conclusions
The key to productivity in SEO is effective prioritization. Learning where to allocate your time and effort can help avoid getting lost in too much data.
Segmenting pages with an efficient method is essential to manage time and achieving an acceptable degree of accuracy in your audits.