Search Intent Flow Audit with a Sankey Chart

Reading time: 9 Minutes

Have you ever struggled to identify where a process is going wrong or which stage needs improvement? If yes, then a Sankey diagram could be the solution you need.

Not only can it be used to show off a funnel marketing flow to stakeholders, but it can also be beneficial to define the multiple shades of search intent for SEO.

Despite the high quality of the output, Sankey charts aren’t easy to prepare but the appetite for cool data visualization sometimes takes over.

In this post, I am going to walk you through a method you can use to plot a Sankey diagram to diagnose search intent with Python.

Quick Intro to Sankey Diagrams

A Sankey diagram is a visualization tool that displays the flow of information or resources through a process. It can help you understand the changes that occur between each stage of a process, making it easier to identify where issues are occurring.

Sankey diagrams are perfect for identifying issues in complex processes, even when there are hundreds of entries. They allow you to spot inconsistencies visually, making it easier to optimize the process flow and improve performance.

When you use these diagrams, it’s important to know a few key terms.

sourcethe starting node (Query,e.g)
targetthe node the source connects to (Page, e.g)
valuethe connection flow volume (Clicks,e.g)
labelthe descriptive text for the flows(Search Intent, e.g)

💡 I recommend reading more about crafting a perfect Sankey diagram on this blog post by Medium.

TL;DR – Requirements & Process

To create a Sankey diagram, you need to install and import Plotly and and webcolors to work with HTML/CSS color definitions.

🔦I strongly recommend following my guide to determine search intent until the paragraph about determining Search Intent for each query.

Once you get there, you can return to this post to start coding the diagram.

The diagram will help us explore the search intent flow describing a sample of queries collected from a medium-large eCommerce in the last 3 months.

%%capture
plotly pandas webcolors

import pandas as pd
import plotly.graph_objects as go
from webcolors import hex_to_rgb

Fine-Tuning Search Intent Labels

Once dependencies are installed, you should have reached the point in the process where search intent has been classified with labels.

Pop back here and crack on with the chunk of code where you should have left my search intent blog post.

df_intents = pd.concat([info_filter,trans_filter,comm_filter,navigational_filter]).sort_values('Clicks', ascending=False)
df_intents = df_intents.drop_duplicates(subset='Query', keep="first")
df_intents = df_intents[ ['Query'] + ['Page'] + ['Clicks'] + ['Impressions'] + ['Intent'] + ['CTR'] + ['Position'] ]
df_intents.head(10)

Sweet.

Time to remove the headers that won’t be included in our diagram and then reset the column order to make it eligible for the plotting phase.

# Drop unnecessary columns and Reindex columns
df_intents = df_intents.drop(['Impressions','CTR','Position'], axis=1) 
df_intents = df_intents[['Intent','Query', 'Page', 'Clicks']]

#tell me how many queries
print(len(df_intents))

#print an overview of the dataset
df_intents.head() 

Preparation of the Sankey Diagram

Here it comes the fun part.

The following script sets up a list of colors that will be used to define the edges of the diagrams.

In addition, it defines a function to collect data from a dataframe to populate a Sankey diagram, and then calls that function on a specific dataframe to create the diagram.

💡 I limited the df_intents.head(50) function (our search intent dataset) to showcase only 50 results, but I’m happy for you to try to remove the .head() attribute at your own convenience.

The reason I limited the output is to keep the diagram at a reasonable size to prevent the diagram from resulting overwhelmed.

#Setup our colours
color_link = ['#000000', '#FFFF00', '#1CE6FF', '#FF34FF', '#FF4A46',
             '#008941', '#006FA6', '#A30059','#FFDBE5', '#7A4900', 
             '#0000A6', '#63FFAC', '#B79762', '#004D43', '#8FB0FF',
             '#997D87', '#5A0007', '#809693', '#FEFFE6', '#1B4400', 
             '#4FC601', '#3B5DFF', '#4A3B53', '#FF2F80', '#61615A',
             '#BA0900', '#6B7900', '#00C2A0', '#FFAA92', '#FF90C9',
             '#B903AA', '#D16100', '#DDEFFF', '#000035', '#7B4F4B',                
             '#A1C299', '#300018', '#0AA6D8', '#013349', '#00846F',
             '#372101', '#FFB500', '#C2FFED', '#A079BF', '#CC0744',
             '#C0B9B2', '#C2FF99', '#001E09', '#00489C', '#6F0062', 
             '#0CBD66', '#EEC3FF', '#456D75', '#B77B68', '#7A87A1',
             '#788D66', '#885578', '#FAD09F', '#FF8A9A', '#D157A0',
             '#BEC459', '#456648', '#0086ED', '#886F4C', '#34362D', 
             '#B4A8BD', '#00A6AA', '#452C2C', '#636375', '#A3C8C9', 
             '#FF913F', '#938A81', '#575329', '#00FECF', '#B05B6F',
             '#8CD0FF', '#3B9700', '#04F757', '#C8A1A1', '#1E6E00',
             '#7900D7', '#A77500', '#6367A9', '#A05837', '#6B002C',
             '#772600', '#D790FF', '#9B9700', '#549E79', '#FFF69F', 
             '#201625', '#72418F', '#BC23FF', '#99ADC0', '#3A2465',
             '#922329', '#5B4534', '#FDE8DC', '#404E55', '#0089A3',
             '#CB7E98', '#A4E804', '#324E72', '#6A3A4C'
             ]
# Collect the data we need from a dataframe to populate our Sankey data - source, target, and value
def get_sankey_data(data,cols,values):
    # Empty lists to hold our data
    sankey_data = {
    'label':[],
    'source': [],
    'target' : [],
    'value' : []
    }
    # Set our counter to zero
    cnt = 0
# Start loop to retrieve data from our dataframe
    while (cnt < len(cols) - 1):
        for parent in data[cols[cnt]].unique():
            sankey_data['label'].append(parent)
            for sub in data[data[cols[cnt]] == parent][cols[cnt+1]].unique():
                sankey_data['source'].append(sankey_data['label'].index(parent))
                sankey_data['label'].append(sub)
                sankey_data['target'].append(sankey_data['label'].index(sub))
                sankey_data['value'].append(data[data[cols[cnt+1]] == sub][values].sum())
                
        cnt +=1
    return sankey_data
# We use this to create RGBA colours for our links. 
# This enables us to have semi opaque links which in turn
# allows us to see flows with out being obscured by solid colours
rgb_link_color = ['rgba({},{},{}, 0.4)'.format(
    hex_to_rgb(x)[0],
    hex_to_rgb(x)[1],
    hex_to_rgb(x)[2]) for x in color_link]
    
# Call our get_sankey_data function - dataframe, colums, values   
sankey_chart = get_sankey_data(df_intents.head(50),['Intent','Query', 'Page'],'Clicks')

Plotting the Sankey Diagram

You are almost ready to view a search intent flow diagram at your fingertips.

Before displaying the output, you need to define the nodes and links of the diagram and customize the layout.

Finally, the script generates a customized flow diagram that portrays the search intent flow for our example website.

# Style our initial Sankey chart
data = go.Sankey(
    node = dict(
      pad = 30,
      thickness = 15,
      line = dict(color = "black", width = 0.5),
      label = sankey_chart['label'],
      color = "goldenrod"
    ),
    link = dict(
      source = sankey_chart['source'],
      target = sankey_chart['target'],
      value = sankey_chart['value'],
      color=color_link
    ))

#nodes colors
color_for_nodes =['steelblue', 'steelblue', 'steelblue', 'gold', 'gold',
                  'gold', 'steelblue','steelblue','green', 'maroon',
                  'green', 'maroon', 'maroon','maroon', 'green', 'maroon',
                  'maroon', 'maroon', 'purple', 'purple', 'purple']
# Link Colours
color_for_links =['LightSkyBlue', 'LightSkyBlue', 'LightSkyBlue', 'goldenrod',
                  'goldenrod', 'goldenrod', 'LightSkyBlue', 'lightgreen',
                  'indianred', 'lightgreen', 'indianred', 'indianred', 
                  'pink', 'lightgreen', 'pink', 'indianred', 
                  'pink', 'indianred', 'pink']


# Prepare our chart
fig = go.Figure(data)
# Update chart with some customisations
fig.update_layout(
    hovermode='x',
    title="<span style='font-size:36px;color:white;'><b>Dodo Search Intent Flow</b></span>",
    font=dict(size=10, color='white'),
    paper_bgcolor='#51504f',
    # Height is needed for risk_ct.csv as the diagram is large
    height=700,
    margin={'t':100,'b':20} # adjust the top margin to move the title down
)

# display chart
fig.show()
Search Intent flow represented on a Sankey diagram

Based on the quality of the search intent classification you performed separately, the intent labels will be classified as above in the left-hand side of the diagram.

The colorful edges connect each search intent label with its related search query, which, in turn, points to the destination URL.

If I were to summarize the main findings of the data, I would say that the Sankey diagram indicates a dominance of transactional terms, with branded queries unsurprisingly accounting for a significant portion of the search traffic.

Conclusion

Creating a Sankey diagram can be a valuable tool for identifying issues in complex processes. By visualizing the flow of information or resources, you can quickly pinpoint areas that need improvement and optimize your process flow for better performance.

Whether you’re working on a marketing campaign or just weighing off your website’s search intent equity, you can use a Sankey diagram to enhance your problem-solving capabilities and improve your overall productivity.

Never Miss a Beat

Subscribe now to receive weekly tips about Technical SEO and Data Science 🔥