Some might argue that sifting through the Google Patents is just a waste of time as they are likely not going to help you in your daily SEO.
But sometimes you will find it useful to equip yourself with useful insights to get wind of the mechanisms behind the search engine’s machine learning processes.
In this post, I will cover a few understandings dealing with the mechanics behind Google’s interpretation of search queries on the SERP.
Identification of Acronym Expansion
This interesting patent describes how Google rewrites or refines original search queries based on a few pieces of information deriving from a subset of documents to display more relevant search results.
Given that “Acronym” stands for “Knowledge Graph”, once search engines parse a query, they chase a suitable entity to connect with. This results in building the bespoke knowledge graph.
The process loops through continuous research (“Expansion“) for appropriate entities to connect to the users’ query.
Once the algorithms infer a level of confidence that the probability to match a search query is high, they cast the search results along with the Knowledge Graph.
Method, Systems and Media for Interpreting Queries
The patent describes the process of how search queries are broken down into n-grams (a sequence of N items from a text) to check whether these terms are related to an entity.
This process constitutes the kernel NLP and NLU, or systems used as a threshold by search engines to understand and interpret the meaning of natural language from search queries.
💡An N-gram is a continuous sequence of N items from a text. In SEO layman’s terms, it’s something like a “cluster of keywords”.
If the search engine deems an n-gram as an entity, a corresponding search result will be returned. The results will ultimately boil down to the frequency/ popularity of the entity itself.
Complicated enough, right? Let’s make an example
Search Query = “Action movie starring Tom Cruise”
N-Grams Detected = “Action”, “Movie”, “Tom”, “Cruise”
Now, a prehistorical search engine would return messy results, such as “Tom&Jerry” or a few suggestions hinting at a crusade somewhere in the world.
Instead, Google would return appropriate media formats to match the searched entity type “Action Film” and the entity “Tom Cruise” sorted by the overall search frequency of both.
This happens because Google is obviously skilled enough to pair up n-grams to generate entities, thereby clustering them in a way that is easier to deliver rightful search results.
Python allows you to easily take advantage of NLP for keyword research purposes. You can learn how to fit n-gram analysis into a comprehensive SEO strategy
Identification of Entities in Search Queries (NER)
The patent describes how Google addresses a search query or a question on a SERP.
NER (Named Entity Recognition) is the process of identifying key entities in a text regardless of stemming from structured or unstructured data (or plain text).
In the above example, for the search query “ceo adidas” Google can only answer the question about the entity “Kasper Rørsted” by combining the entity “Adidas” with the connection type “ceo”
You can take the analysis to the next level by benchmarking your site’s entities against competitors. Learn how to run a script for this purpose in this post!
Semantic Enrichment of Search Queries
The patent reveals how semantic pieces of information on a webpage can help Google interpret entities from search queries.
Recently, the topic has been hyped in the industry as Google proved to parse unstructured data from a webpage and deliver some sort of rich results in disguise. A clear example of this assumption was provided by the Pros&Cons SERP Annotations.
Here’s an example that will clear any doubts.
If you typed the search query:
“boris johnson parents”
You would notice that Google reserves 80% of the SERP to the former British prime minister, as the above entity boxes pop up resulting from the additional semantic information deriving from the Knowledge Graph.
The main takeaway from this mind-blowing patent is that search results can be generated from the matching of small hints of semantic attributes and not only from fat-head entity connections.
Query Refinements by User Intent
This patent called “Clustering Query Refinements by Inferred User Intent” describes the process behind refining queries when search intent is blurry.
Cutting short the boring part, the patent points out that the search engine is able to store all the queries previously typed and deliver appropriate results in the event a similar query is formulated.
In fact, a range of clustering tasks is performed by the search algorithms as long as search queries are organized in ongoing sets of clustered refinements.
As a result of this process, Google is able to deliver the best search results even when the query is not very clear.
You will note the impact of the process when searching for a fat-head entity disclosing room for further suggestions/refinements from Google (e.g “Mars“, see above).
Needless to say, here is where Google MUM algorithms empowerment came into play
Although the purpose of Google’s patents is not to provide actionable SEO tips, it is interesting to get a grip on what is going on behind Google’s complex arsenal of ever-changing algorithms.