When a meta description doesn’t align with search intent, Google’s machine learning may generate a snippet from other content on the page, leading to Google rewriting meta descriptions—even up to 70% of the time.
With a new layer of information retrieval powered by large language models (LLMs), I wondered where Google and LLMs would retrieve and generate replacements for missing meta descriptions.
I asked myself:
- What would Google display on the SERP, and where would it fetch content to replace missing meta descriptions?
- What would LLMs generate in their JSON outputs, and where would they pull content from to replace missing meta descriptions?
TL;DR – On 7 URLs with missing meta descriptions, retrieval-augmented generation (RAG) intervened in slightly different ways depending on the AI provider.
- Google was keen to pull from H1s and bullet points in the text, without a clear order.
- ChatGPT mirrored Google’s style, often drawing from structured or bulleted lines that summarised each post’s purpose.
- Perplexity appeared to fall back on cached meta descriptions from before the test period.
While the test didn’t prove any universal conclusions, it suggested that LLMs and AI search engines tend to mimic Google’s modern retrieval, effectively favouring structured, chunked, and self-contained text during retrieval.
Methodology
Building on the testing groups from my recent article, I decided to take the experiment to the next level.
Find out how I determined testing group URLs when I removed custom metadesctiptions to test SEO gains
For 7 URLs without custom meta descriptions, I submitted an average-length prompt to ChatGPT, Perplexity, and Claude with the following example request:
search for a framework to audit a website’s headings and breadcrumbs for SEO with Python
And here’s a breakdown of the models leveraged during the searches.
| AI search engine | Model |
| ChatGPT | Default GPT-5 – no web search |
| Perplexity | Default – “best” based on each query |
| Claude | Default – Sonnet 4 |
⚠️ Note – I ruled out Claude from the analysis as soon as I realised it omits attributes such as “snippet” or similar that indicate the retrieval of a portion of text from each page.
This means Claude probably don’t use meta descriptions to build up on the answers.
ChatGPT’s sonic_berry probs was over 0.8 for all searches
The fact I’ve been using “search for” ahead of each prompt implied that the model was highly likely to do a web search to start off the retrieval pipeline.
Query Grounding Returned the Conversation JSON Only Once
Anecdotally, GPT-5 made grounding available in the JSON conversation only once out of 7 attempts.
This means the model broke down the prompt into various search queries so they could get used to fetch results from the SERPs.
I wasn’t expecting this, as this has been latitant from most conversations since GPT-5.
Where LLMs Pulled Text to Fill out Missing Meta Descriptions
| URL | Prompt | ChatGPT Snippet | Perplexity Description |
|---|---|---|---|
| Search Intent Sankey Diagram | search for a framework to audit search intent for SEO with Python. I need to plot a sankey diagram to plot the results | 3d paragraph line. “In this post, I’m going to walk you through (…)” | Relied on cached meta description |
| Robots.txt Competitor Analysis | search for a framework to automate robots.txt competitor analysis for SEO with Python | 3d paragraph line. “In this tutorial, I’ll walk you through (…)” | 3d paragraph line. “In this tutorial, I’ll walk you through (…)” |
| Site Structure Audit (Breadcrumbs & H1) | search for a framework to audit a website’s headings and breadcrumbs for SEO with Python | 5th paragraph line. “In this blog post, I’ll share a method to audit your site’s structure (…)” | Relied on cached meta description |
| Orphan Pages Audit | search for a framework to audit orphan pages for SEO with Python | H-1 and first descriptive line. “Orphan Pages Tech Stack Auditing Orphan pages requires you to use a combination of first-party and third-party SEO tools.” | Relied on cached meta description |
| Duplicate Content Audit | search for a framework to audit duplicate content for SEO with Python | Null | Relied on cached meta description |
| ChatGPT Brand & Traffic Referrals | search for a framework to audit traffic referrals on ChatGPT for SEO with Python | Null | 1st paragraph line. “I’ve been quiet for some time in the public space” |
| Canonical Audit Automation | search for a framework to automate canonical checks with Python | Picked the Published Date, not the Modified Date – It also picked the last line of the intro. “Jan 20, 2023 — In this post, I’ll show you how you can automate a quick canonical audit using Python” | Relied on cached meta description |
ChatGPT
Every time I came upon missing metadescription’s URLs cited in ChatGPT’s JSON conversation file, I noticed the corresponding snippet reported the very first lines of each blog post.
⚠️Note – notice the “attribution” param as a tracking indication to benefit our analytics
The snippet attribute usually contains the meta description excerpt. Following on the sankey chart post example, the model retrieved the third paragraph from the intro.
These lines are generally used across my entire website to nudge the reader into the article as straight and concisely as possible.
Interestingly, though, the system replaced missing meta descriptions with a blend of headings and structured text
Translated into a visual screenshot, on this occasion it was a H-1 heading followed by a self-contained and simple sentence.
Perplexity
On some occasions, Perplexity couldn’t detect a custom meta description from the HTML.
And that should be the expected outcome!
In most cases, Perplexity pulled a structured, bulleted excerpt during live retrieval.
However, the “description” attribute was filled with the cached versions – over 29 days old —back when the original metadescriptions were still in place.
Anectodally, I noticed that text retrieved in the snippet was always very clear, self-contained and entity-oriented (e.g; I used “Google Search Console” and not “GSC” or “gsc”).
As you can see, under the Source tab in Perplexity only cached meta description stood out.
Where Google Pulled Text to Replace Missing Meta Descriptions
| URL | Google Search | Meta Description in SERP |
|---|---|---|
| Search Intent Audit (Sankey Diagram) | framework to audit search intent for SEO with sankey diagam | 3d paragraph line. “In this post, I’m going to walk you through (…)” |
| Robots.txt Competitor Analysis | framework to automate robots.txt competitor analysis for SEO with Python | 3d paragraph line. “In this tutorial, I’ll walk you through (…)” |
| Site Structure Audit (Breadcrumbs & H1) | framework to audit a website’s headings and breadcrumbs for SEO with Python | 5th paragraph line. “In this blog post, I’ll share a method to audit your site’s structure (…)” |
| Orphan Pages Audit | framework to audit orphan pages for SEO with Python | Actionable bullet point in the middle of the post. “Activate the Search Console and Analytics API in Screaming Frog. Then, initiate a crawl in List Mode, specifically targeting the orphan pages” |
| Duplicate Content Audit | framework to audit duplicate content for SEO with Python | H-2 and a detached excerpt from an HTML-based table. “Duplicate Content Audit with Python ; Plotly, An open-source data visualization and analytics library that provides a high-level interface” |
| ChatGPT Brand & Traffic Referrals | framework to audit traffic referrals on ChatGPT for SEO with Python | H-1 and structured bullet point in bold somewhere in the text. “How I used Python and Streamlit to track down ChatGPT Brand Mentions and Traffic Referrals · Brand mentions (i.e; backlinks – should we reword” |
| Canonical Audit Automation | framework to automate canonical checks with Python | Last line of the intro paragraph. “In this post, I’ll show you how you can automate a quick canonical audit using Python prior to refreshing how canonical mishaps can occur” |
To no surprise, on 5 searches out of 7 I found target pages being featured within AI Overview.
And the text used as a fallback to the missing metadescription was extracted from either bullet points, H-2 or just simply paragraphs in the text.
The retrieval didn’t seem to follow a specific order making it quite random.
For reference to the above screen grab, this is the bullet Google used as an excerpt to fill up descriptions
Same with results that didn’t trigger an AI Overview
Structured, Chunked and Self-Contained Text – LLMs Emulate Google’s Retrieval
One of the key takeaway is that generative engines favours actionable, short, concise sections (a sentence, paragraph, or list) to build up a replacement of missing meta descriptions.
Why does it matter?
If your content is buried in a long-form narrative, it may be skipped. If it’s cleanly chunked and self-contained, it becomes more usable. This approach is referred to as semantic chunking and was proven to be effective every time ChatGPT sourced the text for the “snippet” attribute in the JSON conversation from my page
While this test can’t provide universal conclusions, it offered some evidence that LLMs mimic modern Google-style information retrieval.
In particular, AI search engines seem to favour structured, chunked, and self-contained text during the retrieval stage of RAG.