The SEO industry is so littered with myths that sometimes a little learning could be a dangerous thing.
One of them is that you can buy SEO happiness with structured data, meaning that you can embellish the likes of your web page on the SERPs by calling up all the schema types you want. To play devil’s advocate, what if structured data were just one step from being dismissed forever?
Google recently tested a feature called Pros & Cons right below the meta description’s callout on the SERP.
Despite being very similar to rich results, these features represent a fine example of Google parsing plain text from a content copy on a page. In other words, this is one of the earliest excerpts of unstructured data retrieved by the search engine, showcased alongside structured data on the SERP.
🤖Google New Pros & Cons are not #RichResults but #Annotations.— Simone De Palma 🦊 (@SimoneDePalma2) July 1, 2022
The search engine can extract the most positive and negative n-grams from a page.
Yet this is not the only one as it all boils down to a 2021 patent.
More in this thread🧵
Two months after the first excerpts of Pros&Cons live on the SERPs, Google added structured data supports for annotations on product review pages.
According to the updated version of the Google Documentation, this means that you are now entitled to explicitly provide this information via schema markup.
In the interim, though, machine learning models at the kernel of the search engine gear up in a bid to find a way to automatically extract Pros&Cons from a product review page.
So SERP #Annotations have been “sold” to structured data manual mark up.— Simone De Palma 🦊 (@SimoneDePalma2) August 5, 2022
While the search engine #ml algorithms try to automatically scrape pros and cons, you are now enabled to explicitly provide this information via schema markup.
A few considerations🧵 https://t.co/xwldr3QssM
Before this goes too in-depth, let’s first learn what are these so-called “Annotations” along with their main features and the mechanics behind their organic generation on the search results page.
If you want to learn more about Annotations, I recommend you dig down the spectacular blog post covered by Marie Haynes on how to use annotations to create better content
What are Annotations?
Before Google documented the feature as structured data. I came across a Google patent called Search result annotations, covering all the bits and pieces showing up on the SERPs that have nothing to do with the traditional blue links. Among them, there were of course the bespoke annotations.
In a nutshell,
Annotations refer to HTML strings retrieved from unstructured data sources, thus plain text from a content copy with the likes of either a product page or a landing page.
How are Annotations being generated?
Annotations are extracted from multiple text-based sources on a webpage.
The patent provides a scientific and exhaustive explanation of the process. Despite the high level of confidence stemming from the reliability of the model, Google patents oftentimes aren’t being executed verbatim.
Hence, you always want to take them with a solid grain of salt.
In short, the query engine detects an annotation from a page and determines whether to process it in real-time or store it in the search index. Next, a supervised machine learning model scores the annotations by type and ultimately ranks them by usefulness.
To expand a bit on the context, let’s consider the previous example about Larceny Bourbon.
Recent progress in the NLP machine learning model has enabled Google to parse unstructured data, that is human-readable text. As we can see from the screen grab below, the yellowed words are those transferred to the SERP as annotations.
This boils down to progress in advanced cluster analysis tasks carried out on entities and primarily n-grams. In fact, from the screenshot, it’s clear that the bespoke annotations stem from a sequence of characters (n-grams) completing the definition of specific entities.
Let’s take one specific source of annotations, “smooth and tasty bourbon”.
First, let’s tokenize the n-gram:
bourbon = entity smooth = connector type tasty = connector type and = proposition
Now we can raise an assumption about how Google addresses the n-gram as entities.
“tasty bourbon” = Entity “smooth bourbon” = Entity
In brief, the web page emphasizes the root entity “bourbon” with the enforcement yielded from the connectors “tasty” and “smooth”.
Google is reinforcing relationships among entities on the Internet following advancements in unsupervised cluster analysis tasks run by machine learning models.
You can perform advanced cluster analysis as well using Python and a few machine learning models. Learn how to run a semantic market analysis to inform your SEO strategy with this tutorial
Annotations Main Features
These HTML strings come up with a few remarkable traits which could potentially disclose room for action or improvement on one’s page content.
- Annotations tend to be query-dependent, meaning they show up depending on the type of search query. As inferred by the patent, fat-head terms are the most prone to conjure up these features.
- Annotations tend to show up depending on the viewport size. Depending on the size of your device, whether it is a desktop or a mobile phone or else, annotations will be more or less keen to pop up.
- Annotations can help Google interpret entities from search queries. Being usually fetched as n-grams, annotations may provide pivotal hints to Google about reinforcing entity relationships.
- Annotations can help reduce pogo-sticking on the SERP, as they are better off at matching search intent. This means that they can potentially entice an increase in CTR.
In layer man’s terms, annotations play a two-folded role resulting in benefits to both the search engine and the public. While resulting helpful to users by moving the needle of shallow search intent, they provide Google with hints on how to improve the search engine entity network.
Example of Annotations
Whether the SEO industry came up with Pros&Cons excerpts of this feature, annotations are actually available in different tastes and flavours.
To stick with the findings provided by the patent, annotations can fit the shoes of List Includes, Version change, Media Annotations or Editorial Reviews (see image below)
In the words of the patent, the above example of editorial reviews is:
“An annotation including a snippet of a user review that mentions running in conjunction with headphones. That users mention running in reviews for a product may result in a higher ranking for the particular product.”
However, there are a few other examples falling under the surface that Google officially doesn’t recognize. In fact, the patent may have missed out on the HTML table snippets perhaps due to a blunder or most likely to the mere flow of time from the patent’s release.
How to Optimize for Annotations
If you were looking for a handy shortcut to markup your web pages on the search results page with such attractive small lines of HTML strings, Google recently added an ad-hoc schema markup type and dropped a few guidelines about it.
The guidance to optimize for annotations points out:
- Annotations were first made eligible for appearance in Search only for editorial product review pages, but on October 25th software engineers at Google agreed to make the Pros & Cons properties available to Online stores as well to display product highlights.
- Ensure to disclose a few statements about the product (positive and negative) and enclose them into the
- Ensure the Pros&Cons are visible to users on the page where the information sits.
Here is a sample of code that you need to use to include the Pros&Cons structured data on your editorial product review page.
According to Google, it’s clearly not mandatory that you wrap up Pros&Cons around structured data as it will thrive to automatically deliver on the SERP such a piece of information where relevant.
However, the search engine seems to emphasize structured data prioritization over the extraction of unstructured data unwrapped within an ordinary page copy.
In the words of Google:
The search engine will prioritize supplied structured data provided by you over automatically extracted data
Rewinding on the Pros&Cons SERP feature, you might be well-equipped to infer that they represent HTML strings retrieved from pages with multiple reviews available, likely clustered by n-grams.
In theory, the easier to understand the content on a page, the easier for Google to submit to its unsupervised cluster analysis models to generate entities, thereby displaying helpful annotations on the SERP.
In practice, Google is surely making progress in parsing unstructured data or plain text on a page but it feels like it’s still a long way until the cows come home.
The current machine learning models gearing up the kernel of the search engine are still falling short of highly automated information retrieval tasks.
To put it straight, Google’s bid to mark up Annotations with structured data is a strategy for harvesting the search algorithms with product reviews data so that one day they’ll cut the mustard on information retrieval automation