Pagination SEO Trends: Empirical Analysis and Insightful Patterns

Have you ever questioned Google’s best practices?

If you’re a greyish-hatter SEO and you put on a poker face when dealing with clients, then I assume you’ve already been caught offside from the fairytale world of Google’s best practices.

When it comes to pagination, Google’s documentation is fairly high-level and generic, aimed at providing stakeholders with standardized guidelines to suit most scenarios. However, while many rules of thumb prove effective for certain businesses, they might not yield the same results for others.

Inspired by an empirical testing approach, I’ve become obsessed with investigating businesses’ approaches to pagination and challenging Google’s SEO best practices. To explore the methodologies surrounding the implementation of pagination, I decided to sample 250 websites from five different industries.

By the end of this post, you will gain a comprehensive understanding of common SEO practices for setting up pagination.

TL;DR – Pagination SEO Trends

pagination analysis insights

Data Analysis Framework

The SEO industry suffers from a considerable lack of data literacy. I often come across case studies marred by sampling errors or an overreliance on uncleaned data, making the outcomes clunky and flaky to extract insights from.

Well, data analysis is the process of bringing order to chaos by manipulating data to discover useful information and meaning, enabling effective data-informed decision-making.

So let me clarify upfront that you most likely won’t be able to make any concrete inferences from the insights in this post. This study was purely exploratory, aimed at understanding how websites implement pagination.

data analysis framework
Source: Klidas and Hanegan, 2022

๐Ÿ“Œ I highly recommend buying Klidas and Hanegan’s book in case you want to get an introduction to data literacy

The best use of these insights is to improve the quality of your questions and assumptions, potentially allowing for more in-depth testing later on.

Methodology

Now that you know you can’t make any inferences, let me explain how I conducted this study.

Data were gathered using a quota sampling approach, meaning that:

  1. I divided websites into subgroups based on 5 types of industries
  2. Then, I collected random samples of 50 websites from each group, amounting to a total of 250 websites.
quota sample and industries object of the analysis
Quota sampling and Industries

I chose quota sampling because it is cost-effective and non-probabilistic. This means I could perform random sampling of websites from each subset of industries, but I couldn’t draw inferences or conduct hypothesis testing. As anticipated, I am not interested in reaching any conclusions!

โŒ Beware the Observer Bias!

Be aware that you might observe data from different verticals with a biased perspective. For example, if you believe e-commerce websites tend to self-canonise pagination, you’ll be more likely to notice this detail when reviewing the insights.

By the way, Giulia Panozzo is a qualified neuroscientist that you can refer to when navigating the customer journey thorugh biases and heuristics in digital marketing

Back to the study, the descriptive analysis explored the variability of the following features over the past 365 days:

  1. Type of Industry (nominal outcome)
  2. Average Word Count (numerical outcome)
  3. Domain Authority (numerical outcome)
  4. Number of Listings (numerical outcome)

After my first round of sampling, I noticed I had only a few decent insights. So, I expanded the analysis to include critical on-page SEO features such as:

  • Alt Text (boolean outcome)
  • Title & Meta Descriptions (boolean and nominal outcome)
  • Rel=Prev/Next (boolean outcome)
  • Loading=lazy (boolean outcome)
  • Organic keywords (numerical outcome, cumulative sum over one year)
  • Schema mark-up (numerical outcome)
pagination features used to analyse the sample

To get to the bottom of it I used a combination of tools:

  • Ahrefs: organic keywords by paginated sequences over one year
  • SEMrush: domain authority calculated on each domain in the analysis
  • Screaming Frog: on-page SEO meta tags and attributes (e.g schema mark-up, canonical, rel=prev/next)

Now, let’s go through some of the most interesting insights.

Pagination by Industry

In this section, we explore trends and patterns with the pagination setup by the subset of industries analyzed.

Organic Keywords by Industry

Organic keywords by Industry

The Real estate industry received the highest share of organic keywords on paginated YoY, while the Travel sector significantly lagged.

From a qualitative standpoint, I noticed that the Real Estate sector tends to hardly adopt new pagination technologies as opposed to the Travel industry.

Only 3 out of 10 sites randomly sampled from the Real Estate subset demonstrated to embrace advanced pagination setups. In the Travel industry, 8 out of 10 sites proved to embrace advanced configurations.

๐Ÿ’กInsight

It is likely that industries leveraging an advanced setup could struggle to garner a considerable proportion of organic keywords as โ€œLoad Moreโ€ and Infinite Scrolling setups may be challenging to track compared to traditional HTML-based setups.

Number of Listings by Industry

Number of Listings by Industry

The eCommerce industry has the largest proportion of items within a single paginated sequence followed by the Travel sector.

๐Ÿ’ก Insight

Infinite scrolling is a common feature in eCommerce, in contrast to the general preference for HTML-based pagination in industries such as Real Estate and Careers. No wonder they showcased the least proportion of items on pagination.

Schema Mark-up by industry

Schema Mark-up by industry

Despite the low average number of schema markup types across industries, the Real Estate and News sectors lead the bubble chart with the greatest proportion of schema types within their paginated sequences.

๐Ÿ’ก Insight

The Travel and eCommerce industries lag, typically featuring at least one schema type per pagination, typically being @BreadcrumbList type

Rel Prev/Next by Industry

Rel Prev/Next by Industry

While the Real Estate industry generally shows less inclination toward adopting cutting-edge pagination setups, it used the ‘rel=prev’ and ‘rel=next’ attributes the least compared to other industries.

For example, eCommerce demonstrated a significant adoption of this antiquated attribute, with 7 out of 10 eCommerce sites randomly sampled in our analysis using it.

๐Ÿ’กInsight

The abrupt u-turn on using rel=prev/next historically affected eCommerce websites in the largest proportions back in the day. So, we can safely dispel any false surprise and acknowledge that eCommerce encompasses a plethora of marketing relationships, including third-party vendors selling the most obnoxious products to other businesses

Loading=Lazy & Alt Text by industry

The Careers industry stands out with the highest average Loading = โ€œlazyโ€ and Alt text count for images in paginated sequences, with News, Travel and eCommerce industries showing similar trends.

lazy loading and alt text count by industry

In contrast, the Real Estate sector proves once again one of the least modernized industries, lagging behind others in both scenarios.

Pagination by Domain Authority

In this section, we explore the pagination setup strategy based on domain authority.

Canonical Strategy by Domain Authority

Canonical strategy by Domain Authority

In the self-referring pagination sample, data points are spread across a higher median authority score and show a greater variance compared to pagination with canonicals pointing to the main category.

๐Ÿ’กInsights

Paginated sequences with self-referring canonicals are more common across authoritative domains

High five to Google’s best practices โœ‹๐Ÿป

Word Count by Domain Authority

Another interesting fact involves word count and authority levels.

Paginated sequences on domains with higher authority tended to have a slightly greater average word count.

Word Count by Domain Authority

Interestingly, when considering canonical strategies, pagination with self-referring canonicals was most common in high-authority domains. In contrast, pagination canonising to the main page was more prevalent on domains with lower authority levels.

Pagination by Number of Listings

In this section, we explore the pagination set up by the number of listings (e.g. count of items on a paginated sequence). The analysis describes the setup based on a range of on-page SEO factors.

Canonical strategy by classes of items

Canonical strategy by classes of items

Pagination with a canonical attribute to the main category was very frequent on listing pages with up to 16 items.

In turn, pagination with self-referencing canonical was common on pages with 17-25 items.

Domain authority by number of items

Domain authority by number of items

Pagination with a greater number of items was located on higher authoritative domains.

๐Ÿ’กInsights

Are search engines somewhat keen to reward websites with paginated sequences offering a broad range of items (products or articles)?

Loading=lazy & Alt text by number of items

Loading=lazy & Alt text by number of items

loading=lazy and Alt text were predominant on paginated sequences with the lowest number of items (4-16).

๐Ÿ’กInsights

Can we assume that the more you browse paginated sequences, the worse the SEO treatment? what’s the impact of advanced pagination technologies on on-page SEO?

Rel=Prev/Next by classes of items

Rel=Prev/Next by classes of items

Pagination utilizing the ‘rel=prev/next’ attribute was more frequent across the middle tiers of the distribution (i.e. 17-25; 26-40) of paginated sequences by classes of items.

This suggests that the attributes were commonly used on paginated sequences providing users with arrows marked up with the obsolete rel=prev/next HTML tag.

Pagination by Word Count

In the final section of the study, we explore the pagination set up by average word count retrieved from randomly sampled paginated sequences.

Canonical Strategy by Word Count

Canonical Strategy by Word Count

Pagination with a higher average word count used a canonical attribute to the master category page.

๐Ÿ’กInsights

Paginated sequences with more content could be prone to be referred to the main page rather than claiming their uniqueness

Indexing strategy by word count

Pagination with a higher average word count had more URLs set as noindex.

Indexing strategy by word count

As a result, Pagination with a higher average word count resulted in not being indexed in Google

Indexing strategy by word count

Important Considerations & Next Steps

So far, we have described some of the most insightful patterns for configuring pagination for SEO. The purpose of this study was to improve the quality of our questions and gain a general understanding of the subject, but we might now be interested in formulating assumptions about correlation and causation.

Are paginated sequences with self-referring canonicals more likely to receive higher organic keywords?

In the next post, I will continue the exploratory data analysis cycle to approach pagination analysis from a slightly different angle in order to study the potential relationships between specific predictors

Summarise this post