Identify Entities and Audit Sentiment in NLP with Python

Reading time: 12 Minutes

The common image that we associate when hearing the term “entity” with respect to any Google search is a long bar appearing underneath the image search bar.

A screenshot of the image search bar representing a long deck of scrolling entities

These clustered boxes of “things” are better known as Entities, and they are usually identified by a process named Named Entity Recognition (NER).

Fair enough, entities not only rule the Visual search but are trailing the traditional search journey.

Following the ongoing progress with AI and machine learning models, Google is starting to observe the Internet realm with increased responsiveness and perhaps even consciousness. In fact, Google is now able to decipher the Internet realm more as things rather than strings.

What can we do as SEOs?

Google is learning to parse plain text copy and break down sentences in a bid to harvest its own entity network within the World Wide Web. Hence, SEOs must turn the other cheek and master a few methods to automate the analysis and parsing of the utter amount of daily SEO data.

In this post, I am going to walk you through a method for entity optimization and then we will jump on a Python framework designed to extract entities and return a small audit of the sentiment spurring from a product page copy.

Table of Contents


How does Entity Optimization work?

Simply put, entity optimization brings together three concepts to provide the most holistic results

  • What: the central topic of the query and what the searcher expects in the content when they search for that topic. All different queries are distinct keywords, but they are just one topic since they mean the same thing. 
  • Why: The intent behind the query. Are they seeking information? Are they considering and evaluating options? Are they looking to transact and make a decision?
  • How: How your content is delivered is just as important. If your audience expects a video and you deliver text, it may not have the desired impact. These are various entities that we put together to create the most holistic semantic SEO strategy. Therefore, page/content layout becomes a critical part of a semantic search strategy.

Search algorithms have surpassed keywords and determine the context and intent behind queries by understanding existent relationships between entities on a radar.

To put this into context, let’s jump on the bandwagon of the operational processes with the following instructions on how to extract entities and sentiment from a product page copy.

Requirements and Assumptions

For the purpose of this coding script, there are a few requirements that we need to satisfy before setting up the environment.

Import the Packages

In the first part of this script, we are going to leverage NLP to tokenize a text from a product page from which we will extract entities.

To kick off, we need to import a few Python libraries.

import os
from google.cloud import language_v1
from google.cloud.language_v1 import enums

from google.cloud import language
from google.cloud.language import types

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize,pos_tag

We are going to import the OS module as it provides a way for interacting with your operating system.



Upload your NLP API Key

Once we have imported our modules, we first need to upload the NLP API. For this purpose, we are going to take advantage of the OS module.

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "YOUR_API_KEY.json"

Text Tokenization

We finally enter the core of NLP process as we are going to confront the very first task, Tokenization.

For this specific project, I wanted to avoid any fuss and hassle with Python. Hence, the first thing we are going to do is simply import our product page text into the following basic function.

text = "In a Fastest Thinker First Frenzy, players take turns to deal their playing cards: a combination of Letter Cards and Action Cards, down on three piles of Letter Cards. As your playing cards are laid down in turn, one by one, on the equal three piles, an Action Card will appear, leaving the two Letter Cards visible."
print(text)

To process the text tokenization, we can choose whether to leverage the Stemming or the Lemmatization technique.

Stemming the Text

To cover both of the options, let’s first get started with the Stemming technique.

As you notice, we need to import an additional library from NLTK which is called PorterStemmer. Hence, include within the word_list function every single bits of the words from the sentences that we imported a while ago.

from nltk.stem import PorterStemmer
porter = PorterStemmer()
word_list = ['In',
 'a',
 'Fastest',
 'Thinker',
 'First',
 'Frenzy',
 'players',
 'take',
 'turns',
 'to',
 'deal',
 'their',
 'playing',
 'cards,',
 'a',
 'combination',
 'of',
 'Letter',
 'Cards',
 'and',
 'Action',
 'Cards,',
 'down',
 'on',
 'three',
 'piles',
 'of',
 'Letter',
 'Cards',
 'As',
 'your',
 'playing',
 'cards',
 'are',
 'laid',
 'down',
 'in',
 'turn',
 'one',
 'by',
 'one',
 'one',
 'the',
 'equal',
 'three',
 'piles',
 'an',
 'action',
 'card',
 'will'
 'appear',
 'leaving',
 'the',
 'two',
 'Letter',
 'Cards',
 'visible',
 ]
print("{0:20}{1:20}".format("Word","Porter Stemmer"))
for word in word_list:
    print("{0:20}{1:20}".format(word,porter.stem(word)))

Here is what you will get

Porter Stemmer Output

Lemmatization of the Text

The other opportunity you have to execute the tokenization is to apply the Lemmatization technique.

First, we need to install the spaCy library on the fly and download its “en” model. Then, we import spaCy and initialize the model by only keeping the tagger component needed for lemmatization, which is (“en_core_web_sm”).

Secondly, we paste the original sentence from our copy text after the Sentence function and we parse it using the loaded “en” model object called “nlp“.

Ultimately, we can extract the lemma for each token and join.

import sys
!{sys.executable} -m pip install spacy
!{sys.executable} -m spacy download en

import spacy

nlp = spacy.load("en_core_web_sm")

sentence = "In a Fastest Thinker First Frenzy, players take turns to deal their playing cards: a combination of Letter Cards and Action Cards, down on three piles of Letter Cards. As your playing cards are laid down in turn, one by one, on the equal three piles, an Action Card will appear, leaving the two Letter Cards visible."

doc = nlp(sentence)

" ".join([token.lemma_ for token in doc])

# ULTIMATE OUTPUT
in a Fastest Thinker First Frenzy , player take turn to deal their playing card :   a combination of Letter Cards and Action Cards , down on three pile of Letter Cards . as your playing card be lay down in turn , one by one , on the equal three pile , an Action Card will appear , leave the two Letter Cards visible .

Identify Entities from Lemmatized Text

Since I reckon lemmatization is probably the best tokenization technique to extract entities from a text, I am going to show you how to identify entities from a copy that has been ultimately lemmatized.

It’s not even that difficult given that we finally dispose of a full lemmatized text at our fingertips.

All we need to do now is to paste the reviewed sentence within the “text_content” function

text_content = "in a Fastest Thinker First Frenzy , player take turn to deal their playing card :   a combination of Letter Cards and Action Cards , down on three pile of Letter Cards . as your playing card be lay down in turn , one by one , on the equal three pile , an Action Card will appear , leave the two Letter Cards visible ."
text_content = text_content[0:1000]

client = language_v1.LanguageServiceClient()

type_ = enums.Document.Type.PLAIN_TEXT

language = "en"
document = {"content": text_content, "type": type_, "language": language}

encoding_type = enums.EncodingType.UTF8

response = client.analyze_entities(document, encoding_type=encoding_type)

for entity in response.entities:
    print(u"Entity Name: {}".format(entity.name))

    print(u"Entity type: {}".format(enums.Entity.Type(entity.type).name))

    print(u"Salience score: {}".format(round(entity.salience,3)))

    for metadata_name, metadata_value in entity.metadata.items():
        print(u"{}: {}".format(metadata_name, metadata_value))

    print('\n')

Once you execute the above lines of code, you will get something similar to the following output.

Entity Output from a product page copy in Python

As you may note, the output comes up with a “Salience score” which is deemed to represent a metric measuring the calculated importance in relation to the rest of the text.

Whether this is not obviously calculated from any Google algorithms, make sure to take it with a grain of salt as it is only the result of a few algorithmic elaborations parsed by spaCy’s algorithms.

Despite not being the case for this project, the above output may sometimes return an additional metric called MIDS. This would indicate that Google has strong confidence in understanding the entity it refers to, as it is likely to own a comprehensive spot in the Knowledge Graph.

Text-mine Sentiment Analysis with NLU

In the second part of this framework, we are going to leverage NLU to carry out an easy sentiment analysis of the submitted product page text.

First, we plot an overview of the Sentiment Attitude spurring from the tone adopted on the product page by setting up the sentiment analysis environment and using Numpy and Matplotlib to plot the outcomes.

document = types.Document(
    content=text_content,
    type=enums.Document.Type.PLAIN_TEXT)

sentiment = client.analyze_sentiment(document=document).document_sentiment
sscore = round(sentiment.score,4)
smag = round(sentiment.magnitude,4)

if sscore < 1 and sscore < -0.5:
  sent_label = "Very Negative"
elif sscore < 0 and sscore > -0.5:
  sent_label = "Negative"
elif sscore == 0:
  sent_label = "Neutral"
elif sscore > 0.5:
  sent_label = "Very Positive"
elif sscore > 0 and sscore < 0.5:
  sent_label = "Positive"

print('Sentiment Score: {} is {}'.format(sscore,sent_label))

predictedY =[sscore] 
UnlabelledY=[0,1,0]

if sscore < 0:
    plotcolor = 'red'
else:
    plotcolor = 'green'

plt.scatter(predictedY, np.zeros_like(predictedY),color=plotcolor,s=100)

plt.yticks([])
plt.subplots_adjust(top=0.9,bottom=0.8)
plt.xlim(-1,1)
plt.xlabel('Negative                                                            Positive')
plt.title("Sentiment Attitude Analysis")
plt.show()

This is what you might obtain.

Output of sentiment attitude analysis from a product page in python

Next, we narrow down a bit and calculate the perceived amount of emotion in a text.

if smag > 0 and smag < 1:
  sent_m_label = "No Emotion"
elif smag > 2:
  sent_m_label = "High Emotion"
elif smag > 1 and smag < 2:
  sent_m_label = "Low Emotion"

print('Sentiment Magnitude: {} is {}'.format(smag,sent_m_label))

predictedY =[smag] 
UnlabelledY=[0,1,0]

if smag > 0 and smag < 2:
    plotcolor = 'red'
else:
    plotcolor = 'green'

plt.scatter(predictedY, np.zeros_like(predictedY),color=plotcolor,s=100)

plt.yticks([])
plt.subplots_adjust(top=0.9,bottom=0.8)
plt.xlim(0,5)
plt.xlabel('Low Emotion                                                          High Emotion')
plt.title("Sentiment Magnitiude Analysis")
plt.show()
Output of sentiment magnitude analysis from a product page in python

As a bonus, we can also try to predict a suitable categorization for our product page based on the sentiment emerging from the copy.

The estimation comes fully equipped with a level of confidence which may hint at the responsiveness of the outcomes.

response = client.classify_text(document)

for category in response.categories:

print(u"Category name: {}".format(category.name))

print(u"Confidence: {}%".format(int(round(category.confidence,3)*100)))
Output of potential Categorization for a product page in python

Even though the outcome does not return a statistically acceptable representation, the alleged category for our Noggin Board Game product page seems to be /’Adults’

Conclusion

Further Readings

This post was directly inspired by the comprehensive workaround on Entities and sentiment analysis conducted by Greg Bernhardt:

Please check this out for further reference:

Getting started with Google NLP API using Python

Related Post