How do you measure Core Web Vitals on a large scale?

The best method to measure Core Web Vitals at scale during an audit is to segment top-traffic driving sections and identify page templates. This process can be automated using a Python script.

What factors should be considered when conducting a Core Web Vitals audit?

When conducting a site speed audit for Core Web Vitals, it is important to segment your site by top-traffic driving templates. Also, you should ensure that only valid and healthy URLs are included in the audit, considering factors such as indexability and HTTP 200 status codes.

What tools can be used to measure Core Web Vitals?

Page Speed Insights from Google is the recommended tool to assess real user experience, as it measures field data from the Chrome User Experience (CrUX). By using the Chrome User Experience API, you can automate the analysis of Core Web Vitals scores in bulk.

✨Core Web Vitals Audit and Automation with Python

October 28, 2024

Core Web Vitals measure page experience signals to ensure an engaging user experience for search users. Each of the Core Web Vitals represents a distinct facet of the user experience, as they measure Field data reflecting the real-world user experience on a web interface.

Core Web Vitals	Feature
Largest Contentful Paint (LCP)	Measures how long it takes to load the largest image or block of text in the viewport.
First Input Delay (FID)	Measures how long it takes for the browser to respond when a user engages with the page (button click, tap, etc.).
Cumulative Layout Shift (CLS)	Measures visual stability to determine whether there is a major shift in the content on-screen while elements are loading.

Core Web Vitals is all about page views; the fewer pages per session, the worse your Core Web Vitals will be.

Now, there’s a common misconception among SEO pros that sampling loads of pages means better measurements. While this is not wrong in theory, you’ll learn it mostly leads to heaps of work that, more often than not, doesn’t make a jot of difference to data accuracy.

The truth is, outliers (extreme values that always pop up during analyses) will always be there muddying up your data, whether you sample 10 pages or 1,000.

So, how do you go about measuring Core Web Vitals on a large scale during a speed performance audit?

In this post, I’ll share a method to help you zero in on the right data for site speed analysis and automate the process in bulk with a Python script.

Table of Contents

Core Web Vitals Audit with Page Templates

The first step is to segment the high-traffic areas of the website and identify key page templates that drive the most visits.

Breaking down the site into its major sections (like category pages, product pages, and informational pages) helps you understand which parts are bringing in the most traffic.

This lets you focus on the templates that are crucial for users and search engines alike, setting up a solid foundation for a targeted Core Web Vitals analysis.

Page Template	Feature
Homepage	Your target website’s primary landing page as it typically attracts most of the organic traffic. Users need a frictionless landing on this page, hence this is the first milestone to factor in your page loading audit.
Category listing pages (CLP)	Categories listing categories. Because these pages help users navigate to a specific page, they are often large traffic poles with a high density of images.
Product listing pages (PLP)	Categories listing products. Despite targeting narrower topics, they are often filled with images and clearer search intent. You can nail down the results of the speed audit and compare them against business goals.
Product detail page (PDP)	Product detail pages. A web page on an eCommerce site that presents the description of a specific product and represents the last touchpoint before prospects progress on to a purchase. Details displayed (e.g size, color, price, shipping information, reviews) are often dynamically injected using CSS or JavaScript. This can affect page loading performance.
Other	You can identify an additional range of page templates beneficial to your business. For instance, large eCommerce sites may stress a preference in looking after their Store Locator or FAQ pages.

💡 Tip

I recommend you learn more about the difference between CLP and PLP in this post from Ahrefs disclosing 11 ways to improve eCommerce category pages

Once you have a clear idea of potential page templates to submit to your analysis, the next step is to translate them into action.

Here’s an actionable way to dissect your website sections into measurable page templates.

1️⃣ Head over to Google Search Console and overview page template that generates the highest organic traffic share in the last 12 months (apply filters where relevant)
2️⃣ Head to Looker Studio and retrieve all the landing pages from your property from the 12 months. Click on Resource > Manage added data sources from the top bar. Then, select your property and hit “Add a Field“
3️⃣ Add a new filter on your table array where you’ll sprinkle some basic SQL to segment landing page, just like in the following example:
4️⃣ Export results into a spreadsheet and sort by Clicks while keeping an eye on CTR (if monitoring a large site, this might help)

Based on that, browse your page templates and for each section cherry-pick the URL with the highest organic traction.

In addition to segmenting your site by top-traffic templates, you have to make sure to include only valid URLs in your analyses.

Recently, Google confirmed that Core Web Vitals no longer impact Noindexed Pages.

This means only discoverable pages are considered in the CrUX dataset, so the following types will not appear in Field data reports:

Pages served with a status code other than 200 (except 301 redirects)
Pages containing a noindex meta tag
Pages served with an X-Robots-Tag: noindex header

Albeit noindex pages don’t get their own CrUx data, beware they can still impact the origin.

💡Tip

You could skip the next steps of this article altogether if you’re using a web crawler like Screaming Frog.

To bulk audit Core Web Vitals, simply set up a full crawl, connect to the Page Speed Insights API, and configure the segmentation classification using Screaming Frog’s built-in features. This setup will give you an efficient, large-scale view of your site’s performance data in no time.

Core Web Vitals Automation with Python

If you’re still chuffed about streamlining this task with Python, there’s a couple of things to keep in mind:

Google collects a subset of Web Vitals, known as Core Web Vitals, in aggregated data over a 28-day period. That’s the exact timeframe we’ll be pulling from the free Chrome User Experience API.
No coding experience needed. This script sticks to the basics of Python and Data Science, so nothing to worry about. The only trickier bits involve a bit of data cleaning and maybe plotting the results, but I’ll sort all that for you.
Use Google Colab to run the script. No dodgy sponsorships (this time!) — just a friendly recommendation. Colab’s the best for anyone getting into Data Science and machine learning with Python because it’s intuitive and easy to use.
You need a Chrome User Experience API key. To run the framework, you’ll need to hook up an API key, which you can easily grab from Google’s official docs page.
Have an XLSX file at hand with a list of page URLs and URL as header. Avoid running more than 50 URLs altogether or you’ll exhaust the API requests.

Get the Chrome User Experience API

💡Tip

With Python you can process massive automation tasks for your SEO.
You can learn how to audit structured data in bulk and how to automate an XML sitemap audit with this coding language.

Install and import libraries

To start off the framework, we need to import some libraries in our environment.

#Data visualization libraries
!pip install plotly
import plotly.graph_objects as go
from plotly.subplots import make_subplots

#data scraping
import requests
import json

#data manipulation
import pandas as pd
import time  # Import time module for sleep

#data upload
from google.colab import files

Fetching CrUX Data

Keep your Page Speed Insigths API at hand so you can paste it as soon as Google Colab returns an input bar.

Next up, you’ll be asked to upload the XLSX file with the list of URL you want to get data from.

The code extracts 75th percentile metrics for Largest Contentful Paint, Cumulative Layout Shift, and Interaction to Next Paint and compiles them into a DataFrame.

# API key for Chrome UX Report API
API_KEY = input("Enter your Chrome UX Report API key: ")  # Replace with your actual API key

def extract_p75_values(json_string):
    """Extracts p75 values from the given JSON string."""
    data = json.loads(json_string)

    metrics = [
        "largest_contentful_paint",
        "cumulative_layout_shift",
        "interaction_to_next_paint"
    ]

    p75_values = {}
    for metric in metrics:
        try:
            p75_value = data["record"]["metrics"][metric]["percentiles"]["p75"]
            p75_values[metric] = p75_value
        except KeyError:
            print(f"Error: Metric '{metric}' or its p75 value not found in JSON.")

    return p75_values

def PageSpeedInsights(url):
    """Fetches performance data from Chrome UX Report API for the given URL."""
    request_url = f"https://chromeuxreport.googleapis.com/v1/records:queryRecord?key={API_KEY}"
    data = {"url": url}
    headers = {"Content-Type": "application/json"}
    response = requests.post(request_url, headers=headers, json=data)

    if response.status_code == 200:
        return extract_p75_values(response.text)
    else:
        print(f"Error for {url}: {response.status_code}")
        return {}

# Upload XLSX file with URLs
print("Please upload an XLSX file containing URLs.")
uploaded = files.upload()
filename = list(uploaded.keys())[0]
df_urls = pd.read_excel(filename)

# Check if the URLs column exists
if 'URL' not in df_urls.columns:
    print("The XLSX file must contain a column named 'URL'.")
else:
    # Initialize an empty list to store the results
    results_list = []

    # Loop through each URL in the uploaded file and fetch data
    for url in df_urls['URL']:
        print(f"Fetching data for: {url}")
        p75_results = PageSpeedInsights(url)
        if p75_results:
            p75_results['URL'] = url
            results_list.append(pd.DataFrame([p75_results]))
        
        time.sleep(2)

    # Concatenate all individual DataFrames into one
    results_df = pd.concat(results_list, ignore_index=True)
    results_df.set_index('URL', inplace=True)

And here’s an example of the output you’ll receive from the final loop through function

You can print the results into a Pandas dataframe by simply opening a new code section and running

results_df.head()

Page Template Segmentation

In the next section, we’re going to segment pages by categories. To do so, we need to reset the index of the dataframe, chunk their URLs and piece together the full process.

# Reset the index to make 'URL' a regular column
results_df = results_df.reset_index()

# Extract PLP (main category) and CLP (parent category)
results_df['CLP'] = results_df['URL'].str.split('/').str[6]
results_df['PLP'] = results_df['URL'].str.split('/').str[7]

# Fill missing values in 'CLP' with 'Homepage'
results_df['CLP'] = results_df['CLP'].fillna('Homepage')

# Convert relevant columns to numeric, handling errors
for col in ['largest_contentful_paint', 'cumulative_layout_shift', 'interaction_to_next_paint']:
    results_df[col] = pd.to_numeric(results_df[col], errors='coerce')

Data Cleaning

Before returning the full output, it’s imperative to further wrangle our dataset to ensure it’s ready for consumption.

We’ll group pages by category based on LCP, CLS, and INP, and refine the value formats in each cell for clarity.

In fact, we will standardise the output of the CrUX export to integers values. This is helpful to minimise potential outliers and avoid misleading conclusions.

summary_df = results_df.groupby('PLP').agg(
    LCP=('largest_contentful_paint','sum'), # Changed 'LCP' to 'largest_contentful_paint'
    CLS=('cumulative_layout_shift','sum'),  # Changed 'CLS' to 'cumulative_layout_shift'
    INP=('interaction_to_next_paint', 'sum'),# Changed 'INP' to 'interaction_to_next_paint'
).reset_index()

# Rename columns for clarity
summary_df.rename(columns={'PLP': 'Page Template'}, inplace=True)

#convert LCP to integers rounded to 1 decimal
summary_df['LCP'] = summary_df['LCP'].round(1).fillna(0).astype(int)
summary_df['INP'] = summary_df['INP'].round(1).fillna(0).astype(int)

# Save the result to an Excel file (optional)
# summary_df.to_excel('Performance_summary.xlsx', index=False)

# Display the result
summary_df.head()

💡IMPORTANT NOTE

Keep in mind that the CrUX data will give you an average from the past 28 days of real user experiences on the submitted pages and will show the data in seconds.

As you noticed, LCP exceeds the 2.5 quality threshold, which, by no means, is the hardest Core Web Vitals to pass.

Plot Core Web Vitals Scores

To improve the visualization of our Core Web Vitals audit, we’ll use a series of subplots created with Plotly.

Feel free to adjust the height and width of the charts, as well as the colours.

I recommend using opaque colours to ensure accessibility for users with visual impairments.

# Create subplots
fig = make_subplots(rows=1, cols=3, subplot_titles=("LCP", "CLS", "INP"))

# Add LCP bar chart
fig.add_trace(
    go.Bar(x=summary_df['Page Template'], y=summary_df['LCP'], name="LCP", marker_color='blue'),
    row=1, col=1
)

# Add CLS bar chart
fig.add_trace(
    go.Bar(x=summary_df['Page Template'], y=summary_df['CLS'], name="CLS", marker_color='grey'),
    row=1, col=2
)

# Add INP bar chart
fig.add_trace(
    go.Bar(x=summary_df['Page Template'], y=summary_df['INP'], name="INP", marker_color='purple'),
    row=1, col=3
)

# Update layout
fig.update_layout(
    title="Performance Metrics by Page Template",
    showlegend=False,
    height=500, width=1000
)

# Show the plot
fig.show()

As a result, you’ll get the following subplots of bar charts.

Notice how a few page categories exceed the LCP and INP quality thresholds and how this solution is so easy to replicate.

Bear in mind that this is just the starting point for a well-informed data visualisation of our Core Web Vitals audit in Python using the CrUX API. For instance, you could tweak the colours, sizes, and add parameters to rotate the labels on the x-axis.

Conclusions

The key to productivity in SEO is effective prioritisation. Understanding where to allocate your time and effort can help you avoid getting lost in excessive data.

Segmenting pages using an efficient method is essential for managing time and achieving an acceptable degree of accuracy in your audits.

FAQ

How do you measure Core Web Vitals on a large scale?

The best method to measure Core Web Vitals at scale during an audit is to segment top-traffic driving sections and identify page templates. This process can be automated using a Python script.
What factors should be considered when conducting a Core Web Vitals audit?

When conducting a site speed audit for Core Web Vitals, it is important to segment your site by top-traffic driving templates.

Also, you should ensure that only valid and healthy URLs are included in the audit, considering factors such as indexability and HTTP 200 status codes.
What tools can be used to measure Core Web Vitals?

Page Speed Insights from Google is the recommended tool to assess real user experience, as it measures field data from the Chrome User Experience (CrUX).

By using the Chrome User Experience API, you can automate the analysis of Core Web Vitals scores in bulk.

Simone De Palma

Technical SEO Executive

Simone De Palma is a Technical SEO Executive at iProspect UK and the founder of SEO Depths.

He graduated in Marketing and Management from Università IULM before completing a degree in Digital Marketing and Data Science at Leeds Beckett University.
Simone has worked as an SEO Specialist in digital agencies in Italy and the United Kingdom and he’s a contributor for the Search Engine Land.

When he’s away from his double screens, he enjoys cooling down with a refreshing swim at the pool. You could find him exploring art museums or enjoying the company of a classic romance.

✨Core Web Vitals Audit and Automation with Python

Core Web Vitals Audit with Page Templates

Core Web Vitals Automation with Python

Install and import libraries

Fetching CrUX Data

Page Template Segmentation

Data Cleaning

Plot Core Web Vitals Scores

Conclusions

FAQ

Simone De Palma

Summarise this post

Subscribe

✨Core Web Vitals Audit and Automation with Python

Core Web Vitals Audit with Page Templates

Core Web Vitals Automation with Python

Install and import libraries

Fetching CrUX Data

Page Template Segmentation

Data Cleaning

Plot Core Web Vitals Scores

Conclusions

FAQ

Simone De Palma

Summarise this post

Share this: