What are NLP & NLU and How They Can Affect SEO

Reading time: 8 Minutes

We have all tripped up in a query typo once in a lifetime. Yet we would always end up receiving a search result matching our typing intentions.

The reason why users always wriggle out their typos on the search bar resonates with a complex combination of machine learning models working together to ensure the search results are the best of the possible worlds.

Thankfully, they are all well-trained. I don’t blame users for mistyping complex names such as surnames, especially when they come from an antipode country.

An example of correct search results resolving a query typo on Google
Rightful search results resolving a query typo

Before jumping the gun, we need to roll back over the years and learn more about the previous search algorithms advancements that undoubtedly played a role in leading the way to the recent NLP and NLU progress.

In this post, I will talk you through the meaning of NLP and NLU and how to make the most out of them to enhance our SEO efforts.

Table of Contents


Google Hummingbird

Google is a semantic search engine and has been progressing to retaliate a strong position on the market since the early 2010s. In fact, an algorithm more than anything else played a crucial role to let Google ascend as a semantic search engine.

A painting of a fat hummingbird
Hummingbird

Google Hummingbird is a search algorithm released in 2013 to return improved semantic results. It emphasized the meaning of search queries over individual keywords in a way that the search engine is able to better understand conversational queries. For this reason, Hummingbird marked the milestone of Entity research.

The algorithm has been making use of NLP to ensure pages matching the meaning or the search context do better in SERPs than pages matching just a few stuffed words.

Rankbrain

Unlike Hummingbird, RankBrain is an AI-based system that enables the search engine to return relevant pages even if they don’t contain the exact words used in a search.

Since the official release in 2015, Google was able to collect a large proportion of never-seen-before searches. Rankbrain helped Google relate web pages to concepts, thereby switching the lenses on a world predominantly made of strings to real things.

Meme of Entities vs Keywords
Entities vs Keywords = 3-0

BERT

Similarly to Hummingbird, BERT is a search algorithm introduced in 2019 designed to trigger the natural understanding of intent and conversation search context with the help of NLU machine learning models. In other words, BERT made it easier to users to find improved and accurate information from the SERPs.

To expand a bit, we could well say BERT summed up the progress trailed by Hummingbird and Rankbrain as it contributed enforcing the authoritativeness of entities in the search results realm.

MUM

MUM (Multitask Unified Model) is a multimodal search algorithm that was released in May 2021 with the aim to understand different content formats like text, images, video, etc. This gives it the power to gain information from multiple modalities, as well as respond suitably. 

MUM is genuinely far superior to BERT as it resonates with more multitasking capabilities, thereby conveying exceeding results than BERT with regards to nuanced results.

An important note on MUM is that the model not only understands content but can produce it. So rather than passively sending a user a result, it can facilitate the collection of data from multiple sources and provide a response through a diversified range of media formats itself (page, voice, etc.).

How search engines process text and natural language 

Having collected further information on the historical background of the NLP and NLU machine learning models, we can now break down their functions.

Both the Natural Language Process and the Natural Language Understanding resonate with supervised machine learning models, meaning their algorithms are being harvested using precompiled labelled data. 

To dive into the roots of the NLP and NLU process, we identify a few steps characterized by peculiar data science techniques.

Tokenization

As mentioned above when using Mr Matthew McConaughey as our guinea pig, a search engine returns results that it deems as close as possible to the query intent. The entire process is what is known as text normalization in NLP.

During this process, the search engines use tokenization to break down the text the searcher typed in the search bar and the text in the document that will be returned.

The breaking of a sequence of words, or n-grams, is necessary because word order does not need to be exactly the same between the query and the document text, except when a searcher wraps the query in quotes.

🛑 In SEO, N-Grams are intended as sequences of n words from a given sample of keywords

However, tokenization can occur through two different methods, such as stemming and lemmatization. They take different forms of tokens and break them down for comparison.

  • Stemming is a data science technique that breaks down a word to its “stem,” or other variants of the word it is based on (e.g “carry” and “carries” refers to the stem: “carri”)
  • Lemmatization is a data science technique that breaks down a token as long as it results as a recognizable word (e.g for “carry”, “carries” lemmatization will always be “carry”).

When it comes to SEO, you may want to opt for a lemmatization technique to fill up your copy with entities expressed in plain English. Conversely, if you want to reverse engineer a certain word to its stem you are more than welcome to opt for the stemming technique.

Personally, I usually prefer lemmatization because of the readiness in providing clues about recognizable entities at my fingertips.

How Semantic Search Impact SEO

After navigating the deep of the machine learning process affecting search engines, let’s try to extract a few bottom lines to make the most out of semantic search and related advancements in NLP and NLU.

Trigger the Switch from Keywords to Topics

Now more than ever it is imperative to stop creating content that revolves around keywords. Conversely, you should trigger the research for n-grams reflecting suitable topics that you can cover in-depth to serve your niche.

How to optimize for semantic search?

Optimizing for semantic search requires considering very well how NLP works on your content.

Because Google uses NLP to identify the context of your writing, you should make sure to write short and neat grammatical sentences.

To rephrase with more technical wording, you ought to make the most out of triples and tuples or sentences using a subject, a verb, and an object.

This leads us to the golden recommendation:

Make a list of n-grams and separate them by user intent

For example, the queries [iPhones vs. Android battery life] or [compare Apple and Samsung phones] both clearly fall under the intent of [compare smartphones].

Once you understand searcher intent, start creating content that directly addresses their intent instead of creating content around individual keywords or broad topics.


What are NLP and NLU

To wrap off this journey through the inside-out of search engine machine learning models, we are finally enabled to provide a set of definitions attempting to recap the main subject of this post.

In a nutshell…

NLP (Natural Language Process) focuses on understanding natural language in terms of patterns detected from a large proportion of unstructured data (AKA entities).

To do that, NLP breaks and processes language using a particular strategy called Tokenization, which aims to normalize a text by either breaking down patterns to their stem (e.g “carry”/”carries” → “carri”) or to an always recognizable pattern (e.g “carry”/”carries” → “carry”).

Such granular techniques are called Stemming and Lemmatization.

NLU (Natural Language Understanding) focuses on understanding the meaning of a whole sentence to provide language comprehension. In fact, it emphasizes sentiment analysis tasks to determine the emotional tone of a text. NLU enables computer programs to deduce purpose from language, even if the written or spoken language is flawed.

Further Readings

Lots of credits go to the deep dive around semantic search conducted by the amazing Olaf Kopp, with his post: What is semantic search: A deep dive into entity-based search from Search Engine Land

Related Posts