What does Indexing mean?

Indexing refers to the practice search engines use to organize the information from the websites that they visit. Indexing is a common practice for search engines and represents the ultimate parsing stage of strings of structured and unstructured information following web crawling and rendering. It's arguably the most important stage, as content not in the index can’t rank for a search result

How to fix crawled but not indexed?

You can fix crawled but not indexed URLs by taking inspiration from the following checklist depending on the size of your website. 1. Fix discontinued pages 2. Ensure key pages have static URLs without query parameters. 3. Remove render-blocking resources to prevent missing content 4. Improve headings and content readability 5. Align search intent with Core updates by adjusting title tags, meta descriptions, and internal links. 6. Submit a dedicated XML sitemap via both Search Console

Why my products are not indexed in Google?

eCommerce websites may encounter a few indexation restraints due to the following reasons: - Lack of a valid XML sitemap containing key product pages - PDP with Out of Stock Items - Dynamic pages with query parameters - Thin or Low-Worded Content - Poor Headings Structure - High Time to First Byte(TTFB) - Rendering Bottlenecks on Critical Resources

How long does it take for a page to be indexed?

It's not possible to estimate how long a page will take to get indexed after being submitted to Google. This may vary depending on a number of factors ranging from a website's size to the type of crawling issue encountered. Further analysis might enable SEO professionals to raise more accurate assumptions, albeit affected by permanent outliers.

Crawled - Not Indexed: What it is & How to Fix on eCommerce

Add me as Preferred Source on Google

February 17, 2024

Indexation is not a prerogative, but a successful achievement.

Search engines can have a bad time when parsing and rendering your content. During the second wave of indexation, critical resources for user experience are those more likely to be phased out due to their own complexities (e.g stylesheets, JavaScript files).

In this post, I will present a few reasons for pages being crawled but not indexed and put forward some advantageous solutions to fix.

Table of Contents

What is Crawled – Not Indexed

Crawled – Not Indexed is an indexing reason issued from the Pages report on Google Search Console to alert webmasters that search engines visited a set of pages but haven’t indexed them yet.

As a result, if you operate a site:search for a crawled but not indexed page you will see Google deserting the search results page for that search operator.

A “Crawled – not Indexed” page being ghosted from the SERP

Despite sounding similar to the “Discovered – Currently Not Indexed” counterpart, there are fewer nuances determining search engines crawling conduct. In fact, this ticket aims to inform webmasters that a set of pages were probably crawled but indexing was put on hold.

Because indexation is fingered as the culprit, we would normally classify this as an issue.

Warning	Issue
❌	✅

Why are pages being Crawled – Not Indexed

The non-indexing reason “Crawled – Not Indexed” may affect a set of pages for a few reasons.

XML sitemaps are Missing
Discontinued PDP or PLP
Dynamic pages with query parameters
Thin or Low-Worded Content
Poor Headings Structure
Invalid Referring Page
High Time to First Byte(TTFB)
Rendering Bottlenecks on Critical Resources
Poor Mobile-friendly pages
Invalid Referring Pages

Let’s explore these in more depth.

XML sitemaps are Missing

Submitting an XML sitemap containing valid URLs is probably the most ancient and DIY prevention against indexation restraints.

Still, a large proportion of eCommerce appears reluctant to follow such practices. The impact of a consistent XML sitemap increases with the size of the online store. Besides, the lack of a dedicated sitemap often engages in a negative correlation with the number of pages crawled but not indexed.

In this example, you can see a PDP (product detail page) being labeled as Crawled – not Indexed within the Google Search Console property of ysl.com/en-gb

As you can see from the Robots.txt file, no XML sitemap has been submitted for the /en-gb/ sub directory

YSL Robots.txt file

Whether this page doesn’t apparently present issues, it easily falls out of Google’s radar. This may represent a major issue for eCommerce, given their SMART objectives are configured with maximizing profits and revenue from existing PDPs.

💡BONUS

XML sitemaps are often huge files that requires time and dedication to sift through. You can learn how to automate an XML sitemaps audit with Python

Discontinued PDP or PLP

“Out of stock” items or old page categories are another classic reason for hampering indexation. This often translates into pages either returning 404 pages or subtle 2xx pages highly likely to be treated as Soft 404s.

Just like in this example

Page not found deriving from a discontinued PDP

A specific model of the sl24 sneakers from Saint Laurent wasn’t available any longer. Despite the configuration of a traditional custom 404 error page, the URL still returns an HTTP 200 status code.

⚠️WARNING⚠️

Handling discontinued products takes up a number of in-depth considerations revolving around both revenue goals and an assessment on whether these pages should be temporarily or permanently discontinued.

You can find more on how to handle discontinued products in this blog post from Content King

Dynamic pages with query parameters

What would happen if you left your house door open for the time you’re away?

Chances are burglars will sneak into your property and you will be robbed of valuable items.

Allowing search engines free access to pages with filters contributes to exacerbating crawling and indexing other than wasting crawl budget.

This is a very common instance I often stumble across when auditing larger eCommerce sites:

PDP crawled – not indexed PDP with query parameters

Given search engines historically prefer to crawl and index pages with static URLs, you should always make sure to design robust URL structures for your eCommerce.

Dynamic URLs vs Static URLs – How to Audit for improved Crawl Efficacy

Within a tech audit, there are a number of underdogs that we tend to bypass pretty quickly.

In one of my latest audits, I stumbled across so many times dynamic URLs to the point I felt like throwing up🤢

🧵 pic.twitter.com/DF5WP5I2n2
— Simone De Palma 🦊 (@SimoneDePalma2) November 22, 2022

What really matters here is paying attention to the Robots.txt file directives to ensure you distract web crawlers from accessing pages with filters.

Because they come with query parameters (e.g ?search=), search engines might be enticed to turn these pages into “Crawled – Not Indexed”.

Thin or Low-Worded Content

Here is another classic reason for pages being prevented from indexation.

This is very common for luxury brands that rely on imagery from highly visual campaigns. Due to the fixed seasonality of fashion collections, some eCommerce often showcase at least 3 sub-categories filled with additional sub-folders including empty-worded pages.

Page with too little body content on YSL

It’s now very clear why this example of a summer 2020 collection campaign at Saint Laurent (en-gb) is being classified by search engines as “Crawled – currently Not Indexed”.

Poor Headings Structure

It’s not always supposed to be on thin content though.

Google may find it hard to parse a set of pages where the headings are actually buried within tons of stylesheets, JavaScript, or even coded with unparsable HTML.

In a recent first-hand test on his site, Mordy Oberstein proved that sometimes it’s about how you serve content to users that can negatively impact crawlability and accessibility.

I don’t know if it’s me or if it’s something happening more often these days in general, but I’ve seen more & more of my pages on the SEO Rant site being crawled but not indexed.

So I made some changes. Lo & behold the pages are now indexed

Here’s what I found fixed my issues pic.twitter.com/RjvjEkvz6G
— Mordy Oberstein 🇺🇦 (@MordyOberstein) September 13, 2022

Long story short, you shouldn’t underestimate measuring the overall JavaScript/CSS dependency of your website as meta tags and headings are often more exposed to dynamic content injection deriving from user interaction.

Invalid Referring Page

If a URL is crawled but not indexed, it might be because the referring page doesn’t exist in the CMS back-end or is simply unlinked from the main navigation making it a perfect orphan page.

In some situations, the page that’s been crawled but not indexed is discovered on a referring page with an infinite pagination URL. This means it’s a unique URL with the same content as the main category page — essentially, it’s a duplicate.

Because this page relies on a duplicate caused by infinite pagination, Google doesn’t pay much attention to it, and that’s why it stays crawled but not indexed.

Unfortunately, it’s difficult to scale up the referring page status using Google Search Console. In fact, the tool doesn’t offer an easy way to get this information for all crawled – not indexed pages as the outcomes provided are not always that accurate.

Either way, you can manually check referring pages for crawled – not indexed URLs on a single page using the URL inspection tool, and look for the “Referring page” label, as suggested above.

High Response Time for a Crawl Request

If search engines are taking too long to accommodate a crawl request to fetch a page, then chances are your pages are likely to end up as “Crawled – not Indexed” or at the utmost “Discovered – currently not Indexed”.

A high response time for a crawl request can hamper crawling and indexing performance, as Googlebot is forced to wait a long time until the very first bytes of the pages finish loading.

The crawl stats on your Google Search Console root will help you with this check.

Let’s break this graph down a bit.

Average page response time for a crawl request	This is the avg. response time for a crawl request to retrieve the page content. It does not include retrieving page resources (scripts, images, and other linked or embedded content) or page rendering time.
Total number of crawl requests	This is the total number of crawl requests to your site, in the time span shown (Google says 90 days but this could get a bit more). Duplicate requests to the same URL are counted.

To prevent your pages from being excluded from the index, you should make sure that the following equation actually occurs

Total number of crawl requests > Avg. page response for a crawl request

Arguably, there are several methods to validate such an equation and they all boil down to site speed optimizations touching on potential rendering bottlenecks.

Without further rambling, you should eyeball the chart to make sure that the average response time is <300 ms to allow the search engines to achieve decent levels of crawlability.

Other considerations around rendering and Core Web Vitals should be made. For instance, having a high LCP score from the CrUx (Chrome User Experience) may represent a symptom.

Rendering Bottlenecks on Critical Resources

As anticipated, pages may have a bad time out in the woods of the rendering process.

Pages being prevented from indexation may suffer from severe discrepancies between the pre-render version (raw HTML) and the post-render version.

Let’s see a few culprits that could prevent successful indexation.

The URL inspection tool from Google Search Console would provide the answer.

You can head to the “ More Info” tab when using the Google Search Console URL Inspection tool and look at the number of Other Errors.

Other Error (“Altro errore”) pinpointing at stylesheets (“Foglio di stile”)

Here the resources you need to be gazing at are stylesheets and JavaScript files.

Once you found them out, you can measure how much unused resources are being wasted on your site (chrome dev.tool > Coverage) and assess to what extent “Other Errors” convert into proper render-blocking resources

Here’s how to run a quick check

Open the chrome dev.tool

Head to the Network tab

Using the search bar, type in a resource from “Other Errors”

Right-click on the culprit resource and select “Block request URL”

Poor Mobile-friendly pages

This is commonly due to the presence of heavy resources that search engines couldn’t fetch (e.g critical CSS).

Mobile-friendly issues on a crawled-not index page

During the second phase of indexing, search engines may skip rendering requests for files like CSS and JavaScript, considering them not worth the render budget.

As a result, important elements of a page can be lost, resulting in web crawlers withdrawing from the crawling process.

It is always a good practice to test with the URL inspection tool as the mobile-friendly tester is not that accurate and is going to be dismissed by Google.

Invalid Referring Pages

If a URL is crawled but not indexed, it might be because the referring page doesn’t exist in the CMS back-end or perhaps is an orphan page and search engines will likely think twice before pushing it into the index

You can use the Inspection tool on Google Search Console to check on the referring pages, once you’ve identified a URL from the crawled not indexed report.

For example, there was a time when I had a crawled, not indexed page found on a referring page that was a page with redirect.

Since the page depends on a redirected URL, Google wasn’t giving it much attention, and that’s why it remained crawled but not indexed.

How to Fix Crawled – Not Indexed Pages

As with most SEO processes, fixing indexation restraints doesn’t come in a week.

In the first place, the problem should be diagnosed from different angles using the above-mentioned few hints. The page indexing report on Google Search Console will nail down most of the effort, so you need to work closely with this invaluable first-party tool.

Example of Crawled – not Indexed PDP (Google Search Console)

Due to being one of the most nuanced non-indexing reasons, there’s no single cure for pages affected by indexation delays.

Instead, make sure to consider the following:

1️⃣ Submit a dedicated XML sitemap.
It is widely recommended to consider submitting the XML sitemap via both the Search Console property and the Robots.txt file to prompt search engines to pay a visit to your website’s most relevant pages.
2️⃣ Fix discontinued PDP or PLP.
Depending on items’ availability over time and how much revenue is generated, you should raise assumptions to consider removing these pages and internal links by returning an HTTP 410 response code. This will help you save crawl budget, given that Google tends to crawl less frequently pages returning HTTP 410 than HTTP 404
3️⃣ Ensure key PDPs are configured as static URLs.
Although search engines can crawl parameters, having a clear URL structure benefits the overall website navigation. In case key PDPs and PLPs were caught with unpleasant query parameters, consider the extent to block them in your site’s Robots.txt file.
4️⃣ Remove render-blocking resources and keep crawl requests time at bay.
As anticipated, you need to ensure your pages aren’t reliant on client-side rendering. This could lead to the rendered HTML coming with dynamically injected content that doesn’t exist in the raw HTML. This is an issue in case you expected the missing information to be discovered by web crawlers but it turns out it’s not.
5️⃣ Improve headings and evaluate room for content integration.
This includes rephrasing sentences where the content is either too short or hard to read. Ideally, you should write concise sentences avoiding adverbs and too specific jargon. Rather, target the right entities and find proper synonyms to convey elaborated sentences. You can help rephrase your sentences using Text Analyzer and leverage fitting entities with Google’s NLP tool demo.
6️⃣ Adjust search intent so that it aligns with the intent shifts caused by a Core update. This includes tweaking title tags and meta descriptions accordingly but also finetuning your content with contextualized internal links.

An alternative way to fix Crawled – currently not indexed, is to submit URLs in a dedicated XML sitemap.

The solution is inspired by a deep dive into alternative approaches to XML sitemaps by Oliver Mason which I highly recommend checking out

This process helps search engines find and index web pages. It’s helpful when pages don’t have other issues that stop search engines from indexing them and have a good chance of being indexed on their own.

Here’s how it works in a nutshell

Generate an XML sitemap based on a restricted slice of Crawled – not indexed/Discovered-not indexed URLs
Monitor your access logs for Googlebot requests.
Remove URLs from the list whenever Googlebot requests one of the Crawled – not indexed/Discovered-not indexed URLs
Append the newly requested URL to the appropriate long-term XML Sitemap (e.g. /posts-sitemap-45.xml).

Conclusion

A lot is going on when it comes to fixing indexation restraints despite pages being crawled.

Having a large proportion of crawled pages flogging Google index to no avail can wreak havoc crawl budget and harm crawl efficacy. On the flip side, you can look at the issue from several angles and use different methods to nail it down.

Hopefully, this post contributed to achieving at least a small part of it.

Let me know in the comments or on Twitter if you’re struggling with “Crawled – not Indexed” pages.

FAQ

What does Indexing mean?

Indexing refers to the practice search engines use to organize the information from the websites that they visit.

Indexing is a common practice for search engines and represents the ultimate parsing stage of strings of structured and unstructured information following web crawling and rendering.

It’s arguably the most important stage, as content not in the index can’t rank for a search result
How to fix crawled but not indexed?

You can fix crawled but not indexed URLs by taking inspiration from the following checklist depending on the size of your website.

1. Fix discontinued pages
2. Ensure key pages have static URLs without query parameters.
3. Remove render-blocking resources to prevent missing content
4. Improve headings and content readability
5. Align search intent with Core updates by adjusting title tags, meta descriptions, and internal links.
6. Submit a dedicated XML sitemap via both Search Console
Why my products are not indexed in Google?

eCommerce websites may encounter a few indexation restraints due to the following reasons:

– Lack of a valid XML sitemap containing key product pages
– PDP with Out of Stock Items
– Dynamic pages with query parameters
– Thin or Low-Worded Content
– Poor Headings Structure
– High Time to First Byte(TTFB)
– Rendering Bottlenecks on Critical Resources
How long does it take for a page to be indexed?

It’s not possible to estimate how long a page will take to get indexed after being submitted to Google. This may vary depending on a number of factors ranging from a website’s size to the type of crawling issue encountered.

Further analysis might enable SEO professionals to raise more accurate assumptions, albeit affected by permanent outliers.

Simone De Palma

Technical SEO Executive

Simone De Palma is a Technical SEO Executive at iProspect UK and the founder of SEO Depths.

He graduated in Marketing and Management from Università IULM before completing a degree in Digital Marketing and Data Science at Leeds Beckett University.
Simone has worked as an SEO Specialist in digital agencies in Italy and the United Kingdom and he’s a contributor for the Search Engine Land.

When he’s away from his double screens, he enjoys cooling down with a refreshing swim at the pool. You could find him exploring art museums or enjoying the company of a classic romance.

❓What is Crawled – Not Indexed and How to Fix for eCommerce