We have all tripped up in a query typo once in a lifetime. Yet we would always end up receiving a search result matching our typing intentions.
The reason why users always wriggle out their typos on the search bar resonates with a complex combination of machine learning models working together to ensure the search results are the best of the possible worlds.
Thankfully, they are all well-trained. I don’t blame users for mistyping complex names such as surnames, especially when they come from an antipode country.
As an example, I bet many of us have been struggling at least once to type Matthew McConaughey correctly on the Google search bar. The reason behind the search engine returning always the closest result to your typos query boils down to the combo of Natural Language Processing (NLP) and Natural Language Understanding (NLU).

Before jumping the gun, we need to roll back over the years and learn more about the previous search algorithms advancements that undoubtedly played a role in leading the way to the recent NLP and NLU progress.
In this post, I will talk you through the meaning of NLP and NLU and how to make the most out of them to enhance our SEO efforts.
Table of Contents
- Google Hummingbird
- Rankbrain
- BERT
- MUM
- How search engines process text and natural language
- Tokenization
- How Semantic Search Impact SEO
- What are NLP and NLU
Google Hummingbird
Google is a semantic search engine and has been progressing to retaliate a strong position on the market since the early 2010s. In fact, an algorithm more than anything else played a crucial role to let Google ascend as a semantic search engine.

Google Hummingbird is a search algorithm released in 2013 to return improved semantic results. It emphasized the meaning of search queries over individual keywords in a way that the search engine is able to better understand conversational queries. For this reason, Hummingbird marked the milestone of Entity research.
The algorithm has been making use of NLP to ensure pages matching the meaning or the search context do better in SERPs than pages matching just a few stuffed words.
Rankbrain
Unlike Hummingbird, RankBrain is an AI-based system that enables the search engine to return relevant pages even if they don’t contain the exact words used in a search.
Since the official release in 2015, Google was able to collect a large proportion of never-seen-before searches. Rankbrain helped Google relate web pages to concepts, thereby switching the lenses on a world predominantly made of strings to real things.

BERT
Similarly to Hummingbird, BERT is a search algorithm introduced in 2019 designed to trigger the natural understanding of intent and conversation search context with the help of NLU machine learning models. In other words, BERT made it easier to users to find improved and accurate information from the SERPs.
To expand a bit, we could well say BERT summed up the progress trailed by Hummingbird and Rankbrain as it contributed enforcing the authoritativeness of entities in the search results realm.
According to Google, BERT helps better understand the nuances and context of words in searches and better match those queries with more relevant results.
With BERT your words in a query can be filled up in different order without affecting search results.
MUM
MUM (Multitask Unified Model) is a multimodal search algorithm that was released in May 2021 with the aim to understand different content formats like text, images, video, etc. This gives it the power to gain information from multiple modalities, as well as respond suitably.
MUM is genuinely far superior to BERT as it resonates with more multitasking capabilities, thereby conveying exceeding results than BERT with regards to nuanced results.
An important note on MUM is that the model not only understands content but can produce it. So rather than passively sending a user a result, it can facilitate the collection of data from multiple sources and provide a response through a diversified range of media formats itself (page, voice, etc.).
How search engines process text and natural language
Having collected further information on the historical background of the NLP and NLU machine learning models, we can now break down their functions.
Both the Natural Language Process and the Natural Language Understanding resonate with supervised machine learning models, meaning their algorithms are being harvested using precompiled labelled data.
To dive into the roots of the NLP and NLU process, we identify a few steps characterized by peculiar data science techniques.
Tokenization
As mentioned above when using Mr Matthew McConaughey as our guinea pig, a search engine returns results that it deems as close as possible to the query intent. The entire process is what is known as text normalization in NLP.
During this process, the search engines use tokenization to break down the text the searcher typed in the search bar and the text in the document that will be returned.
The breaking of a sequence of words, or n-grams, is necessary because word order does not need to be exactly the same between the query and the document text, except when a searcher wraps the query in quotes.
đ In SEO, N-Grams are intended as sequences of n words from a given sample of keywords
Tokenization represents the core of the NLP model and can well represent the milestone to kick off the Entity research in NLP
However, tokenization can occur through two different methods, such as stemming and lemmatization. They take different forms of tokens and break them down for comparison.
- Stemming is a data science technique that breaks down a word to its âstem,â or other variants of the word it is based on (e.g âcarryâ and âcarriesâ refers to the stem: âcarriâ)
- Lemmatization is a data science technique that breaks down a token as long as it results as a recognizable word (e.g for âcarryâ, âcarriesâ lemmatization will always be âcarryâ).
When it comes to SEO, you may want to opt for a lemmatization technique to fill up your copy with entities expressed in plain English. Conversely, if you want to reverse engineer a certain word to its stem you are more than welcome to opt for the stemming technique.
Personally, I usually prefer lemmatization because of the readiness in providing clues about recognizable entities at my fingertips.
How Semantic Search Impact SEO
After navigating the deep of the machine learning process affecting search engines, let’s try to extract a few bottom lines to make the most out of semantic search and related advancements in NLP and NLU.
Trigger the Switch from Keywords to Topics
Now more than ever it is imperative to stop creating content that revolves around keywords. Conversely, you should trigger the research for n-grams reflecting suitable topics that you can cover in-depth to serve your niche.
Q/A 2ď¸âŁ
â Simone De Palma đŚ (@SimoneDePalma2) July 7, 2022
Perhaps I’m getting mad, but all I can see from this screen grab is n-grams đ¤Ł
Could writing content with a more in-depth comprehension of n-grams be the key for the future #content #seo? pic.twitter.com/KFfPQgObUB
How to optimize for semantic search?
Optimizing for semantic search requires considering very well how NLP works on your content.
Because Google uses NLP to identify the context of your writing, you should make sure to write short and neat grammatical sentences.
To rephrase with more technical wording, you ought to make the most out of triples and tuples or sentences using a subject, a verb, and an object.
This leads us to the golden recommendation:
Make a list of n-grams and separate them by user intent
For example, the queries [iPhones vs. Android battery life] or [compare Apple and Samsung phones] both clearly fall under the intent of [compare smartphones].
Once you understand searcher intent, start creating content that directly addresses their intent instead of creating content around individual keywords or broad topics.
Learn how to define search intent for GSC query with this free Python script
What are NLP and NLU
To wrap off this journey through the inside-out of search engine machine learning models, we are finally enabled to provide a set of definitions attempting to recap the main subject of this post.
In a nutshell…
NLP (Natural Language Process) focuses on understanding natural language in terms of patterns detected from a large proportion of unstructured data (AKA entities).
To do that, NLP breaks and processes language using a particular strategy called Tokenization, which aims to normalize a text by either breaking down patterns to their stem (e.g âcarryâ/âcarriesâ â âcarriâ) or to an always recognizable pattern (e.g âcarryâ/âcarriesâ â âcarryâ).
Such granular techniques are called Stemming and Lemmatization.
NLU (Natural Language Understanding) focuses on understanding the meaning of a whole sentence to provide language comprehension. In fact, it emphasizes sentiment analysis tasks to determine the emotional tone of a text. NLU enables computer programs to deduce purpose from language, even if the written or spoken language is flawed.
To put in SEO the bespoke features, learn how to identify Entities and audit Sentiment with NLP with this free Python script to apply on landing pages or product pages.
Further Readings
Lots of credits go to the deep dive around semantic search conducted by the amazing Olaf Kopp, with his post: What is semantic search: A deep dive into entity-based search from Search Engine Land
Related Posts