Advanced SEO Tracking Framework for Zero-Click and AI Visibility

As AI platforms emerge as new traffic channels, search behaviour is more fluid and nuanced. Unlike traditional search engines, AI visibility is harder to measure as AI responses are generated in real time.

Prompt tracking is often mentioned, but common guidance on attribution is scarce. 

The key point is that prompts are synthetic data projections based on actual search queries, floating around closed systems.

LLMs fine-tune models based on degrees of randomness (i.e; temperature) and return AI responses based on users’ location and past conversations.

If tracking is bound to far-fetch, rather than focusing on directional and synthetic prompts, analysing behavioural patterns behind them proves more factual.

AI search beyond the prompts
I couldn’t resist the magic of NotebookLM to summarise the intro

In this article, Iโ€™ll show you how to build an advanced framework to tackle zero-click searches and guide your decisions on AI visibility by integrating Google Search Console and Bing data.

This is not a prompt tracking framework

The only thing you should measure is how often your brand appears in LLM responses for your most representative topics.

  1. Does the LLM model correctly associate your brand with its marketed categories?
  2. Granted that tracking tools provide inconsistent output, does the model consistently place your brand in the right ‘bucket’?

What You Need to Get Through

The main requirements to replicate this framework are:

  • Access to Big Query. You can get started with this guide from Datacamp
  • Google Search console connected natively to Big Query. If you’re in doubt, the Google’s support can help you get started.
  • Bing API connected to Big Query via a Cloud scheduler. This guide can help you kick off but I highly recommend you have it set up by a proper engineer at your company*
  • Becoming familiar with SQL, SQLX and Dataform โ€“ this is a piece of cake in the era of LLMs

* This step is very advanced โ€“ it requires in-depth knowledge of Google Cloud and service accounts, scheduling Cron jobs in Python or JavaScript. Leave it to a specialist, trust me.

Google Search Console, Bing & BigQuery

Google Search Console provides valuable first-party insights into how your site appears in Google Search โ€” including queries, impressions, clicks, and page performance.

However, their limitations around data sampling, thresholding and anonymised queries are notorious

Still, this can be solved by sending your data flow straight to Big Query.

Copyright to Seotistics by Marco Giordano

As Marco Giordano highlights in his work on web analytics, relying on BigQuery exports rather than the Google Search Console interface unlocks several key benefits:

Main advantages:

  • Access to data beyond the 16-month limit (finally, historical data)
  • Recovery of visibility lost to anonymised queries
  • Significantly higher volumes of clicks and impressions data

A similar story applies to your Bing Webmaster Tools.

While itโ€™s best to pull Bing data through BigQuery for large-scale analysis, it remains a valuable source to complement our initial dataset a comprehensive view of your website’s search demand.

Here are a few reasons why:

Grounded queries used by LLMs are tracked only via Bing API, not on the BWTi nterface

Building a BigQuery table based on Google Search Console and Bing API

GoalBuild unified search intelligence by consolidating Google and Bing data into incremental marts, and join datasets to concatenate queries from both sources
ObjectiveIdentify zero-click queries that signal high-intent research behaviour across the customer journey.
Leading Indicator (i.e Metric)Impressions by query length โ€“highlighting high-visibility searches that generate demand without driving clicks.
TouchpointsGoogle AI Mode, AI Overviews and and conversational platforms such as ChatGPT
RequirementsFirst-party search data from Google Search Console and Bing, integrated via BigQuery.

Pre-processing considerations

When building your table, Ramon Eijkemans suggests preprocessing your query data.

This is not only about reducing processing costs, but also excluding irrelevant terms, such as queries containing โ€œloginโ€.

The same consideration applies to branded queries. Depending on how well your brand is recognised by search enginesโ€”and, by extension, by LLMsโ€”you may or may not want to include them in your dataset. 

Labelling queries as branded or non-branded from the start provides a more balanced approach, but it requires additional engineering work in BigQuery and Google Cloud. 

Conversely, prioritising non-branded queries can help surface high-intent queries with untapped SEO potential.

For this blog post, Iโ€™ll focus on capturing non-branded terms using a JavaScript function.

Paste the following into a new tab in Big Query and make sure to reference your project ID.

JavaScript
CREATE OR REPLACE FUNCTION `your_project-140515.search_console.clean_text`(input STRING)
RETURNS STRING LANGUAGE js AS R"""
OPTIONS(location='europe-west3');
if (input === null) return null;
const accentsMap = {
 'รง':'c','รฆ':'ae','ล“':'oe','รก':'a','รฉ':'e','รญ':'i','รณ':'o','รบ':'u',
 'ร ':'a','รจ':'e','รฌ':'i','รฒ':'o','รน':'u','รค':'a','รซ':'e','รฏ':'i',
 'รถ':'o','รผ':'u','รฟ':'y','รข':'a','รช':'e','รฎ':'i','รด':'o','รป':'u',
 'รฅ':'a','รธ':'o','รฑ':'n',
 'ร‡':'C','ร†':'AE','ล’':'OE','ร‘':'N'
};
let normalizedText = input
 .split('')
 .map(c => accentsMap[c] || c)
 .join('');
let processedText = normalizedText.toLowerCase();
// remove non-ascii
processedText = processedText.replace(/[^\x00-\x7F]/g, '');
// remove phone-like numbers
processedText = processedText.replace(/\+?\b(?:\d[ -]?){9,}\b/g, ' ');
// remove branded terms 
processedText = processedText.replace(/\b(your|regex, '');
// clean up extra spaces
processedText = processedText.replace(/\s+/g, ' ').trim();
return processedText;
""";

That would return the following example under Routines.

Preparing Search Queries from Google Search Console

Head to Dataform and create incremental marts to merge queries from the GSC searchdata_url_impression dataset with your Bing API dataset.

Weโ€™re going to create a project in Dataform and generate an incremental view using SQLX.

First, create a repository

Next, initialise a workspace

Now we’ll create a mart.

Create a file that reads like definitions/incremental_ai_search.sqlx

Paste in the following SQLX query

SQL
config {
  type: "incremental",
  uniqueKey: ["date", "query"],
  bigquery: {
    partitionBy: "date",
    clusterBy: ["query"]
  }
}
WITH source AS (
  SELECT
    data_date AS date,
    `your_project-140515.searchconsole_uk.basic_text_preprocess`(query) AS query,
    ARRAY_LENGTH(
      SPLIT(
        TRIM(`your_project-140515.searchconsole_uk.basic_text_preprocess`(query)),
        ' '
      )
    ) AS word_count,
    impressions,
    clicks,
    sum_position,
    country
  FROM `your_project-140515.searchconsole_uk.searchdata_url_impression`
  WHERE
    Is_anonymized_query = FALSE
    AND country = 'gbr'
),
aggregated AS (
  SELECT
    date,
    query,
    word_count,
    country,
    SUM(impressions) AS total_impressions,
    SUM(clicks) AS total_clicks,
    SUM(sum_position)/SUM(impressions) + 1 AS avg_position,
  FROM source
  GROUP BY
    date,
    query,
    word_count,
    country
)
SELECT *
FROM aggregated
WHERE
  word_count BETWEEN 10 AND 20
  AND avg_position < 3
  AND total_clicks <= 10

If you run it momentarily, this will output the following example with your Search Console queries

Clicking on Start execution will run the following workflow. 

If you click on the link next to Action destination, you will be able to locate the resulting dataset. 

Congrats! this is your GSC incremental mart in Dataform.

Preparing Search Queries from Bing

If you managed to schedule your Bing data flow through their API correclty into BigQuery, you can move on with this.

Replicate the pre-processing code above, but be sure to use your Bing webmaster project ID as the point of reference.

Next up, create the incremental view:

SQL
config {
    type: "incremental",
    uniqueKey: ["date", "query"],
    bigquery: {
        partitionBy: "date",
        clusterBy: ["query"]
    }
}
WITH
  SOURCE AS (
  SELECT
    Date AS date,
    `your_project-140515.bing_webmaster.clean_text`(query) AS query,
    ARRAY_LENGTH(SPLIT(TRIM(`your_project-140515.bing_webmaster.clean_text`(query)), ' ')) AS word_count,
    impressions,
    clicks
  FROM
    `your_project-140515.bing_webmaster.bing_query_page`
  WHERE
    query IS NOT NULL
    AND query != '' ),
  aggregated AS (
  SELECT
    date,
    query,
    word_count,
    SUM(impressions) AS total_impressions,
    SUM(clicks) AS total_clicks,
    SAFE_DIVIDE(SUM(clicks), SUM(impressions)) AS avg_ctr,
    TIMESTAMP(date) AS last_seen
  FROM
    SOURCE
  GROUP BY
    date,
    query,
    word_count )
SELECT
  *
FROM
  aggregated
WHERE
  word_count BETWEEN 10 AND 20
  AND total_clicks <= 10

You will end up with something similar from your Dataform view.

Clicking on Start execution will run the following workflow. 

If you click on Action destination, you will be able to locate your dataset, which weโ€™ll use later to join the Search Console dataset.

Concatenate Google Search Console to Bing API Queries

At this point, you will combine Google and Bing search data into a single table.

The new table will contain a ‘source‘ column to identify which search engine each search query came from. 

Create another incremental mart based on a UNION ALL command in Dataform called definitions/ai_bing_gsc_merge.sqlx


config {

  type: "incremental",

  schema: "dataform",

  tags: ["search", "incremental"],

  uniqueKey: ["query", "source", "date"],

  bigquery: {

    partitionBy: "date",

    clusterBy: ["source", "query"]

  }

}

SELECT

  date,

  query,

  word_count,

  total_impressions,

  total_clicks,

  'google' AS source

FROM ${ref("incremental_ai_search")}

UNION ALL

SELECT

  date,

  query,

  word_count,

  total_impressions,

  total_clicks,

  'bing' AS source

FROM ${ref("incremental_ai_search_bing")}

The table updates incrementally (only new data) and is partitioned by date for faster queries.

Clicking on Start Execution will trigger the pipeline with the details view confirming the job was successful.

Visiting the new dataset from the link next to Action Destination, you will see the resulting dataset with queries divided by source (i.e, Bing and Google)

Bing API & GSC data merged into an incremental mart โ€“ only queries longer than 10 characters

๐Ÿ’กPro Tip – Not all long-form queries are factual searches

The discovery of America, anyway…

The merged dataset may contain terms inflated by synthetic queries generated by third-party prompt tracking tools.

Many AI platforms require users to select a country and language for tracking. Some even automatically suggest related terms.

By pre-configuring prompts, these tools may append artificial modifiers to the original queries, trigger requests from their own IP addresses, and then extrapolate the results.

The main reason these synthetic query clusters appear in your data is that some tools rely on unreliable proxy networks, which fail to mask their traffic.

Common examples are country and language specifications (e.g; โ€œin the United Kingdomโ€, โ€œreply in Englishโ€)

Cluster by Funnel Stage and Export

You could store the outcome in your Google Cloud profile by clicking the Export button

However, it’s best to query the dataset on demand to break down costs.

A smart approach is to map out search queries by funnel stage.

The following query maps out transactional search terms.

SQL
SELECT
  query,
  ANY_VALUE(word_count) AS word_count,
  SUM(total_clicks) AS total_clicks,
  SUM(total_impressions) AS total_impressions,
  ANY_VALUE(source) AS source
FROM `your_project.dataform.ai_bing_gsc_merge`
WHERE
-- Partition pruning (adjust dates as needed)
date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
--Filter by intent
AND REGEXP_CONTAINS(
    LOWER(query),
    r'^(give me|step by step|buy|purchase|pricing|price|cost|deal|discount|cheap|last minute|checkout|book me|book|check)\b'
  )
 --Filter out brand terms
  AND NOT REGEXP_CONTAINS(
    LOWER(query),
    r'(brand|Brand|bRAnd\2)'
  )
GROUP BY
  query
ORDER BY
  total_impressions DESC;
--- Information ^(what|what is|who|why|when|how does|how do|how much|how long|explain|guide|provide a)\b.*
--- Consideration ^(show me|find me|help me find|best|list|review|ratings|top|leading|recommended|alternatives to|competitors of)\b.*
--- Transaction ^(give me|step by step|buy|purchase|pricing|price|cost|deal|discount|cheap|last minute|checkout|book me|book|check)\b.*

Despite relatively cheap, querying is not free but it allows us to export smaller files for freeโ€”often under 1 GB.

Within a CSV export, all you have to do is remove duplicates and create the Funnel Stage column.

Use Cases of Long Form Queries from your Google Search Console and Bing Dataset

This framework provides a solid understanding of your zero-click searches, impacting your organic performance across Google Search Console and Bing.

Specific use cases include:

  • Create detailed FAQs โ€“ Identify the questions users ask when they want more than the quick answers shown in AI Overviews or AI mode.
  • Tap into new channels โ€“ Google is no longer the only destination. Share these insights with relevant teams to support stronger, more diverse content strategies.
  • Work with Digital PR โ€“ Brand mentions and earned media are especially important for zero-click searches. Share this data with the PR team, grouped by search intent, to encourage stronger coverage and quality citations.
  • Support PPC efforts โ€“ Provide the paid team with search-intent insights so they can focus spend on high-intent keywords, test long-tail opportunities, and use remarketing to re-engage users who didnโ€™t convert.

Prompt Emulation (for Survival)

While this framework shouldn’t serve prompt tracking, you might have to respond to similar requests from clients or the C-board, and sometimes it’s better to just work head down to bring bread to the table.

Once the labelling on the CSV above is complete, you can simply use a =CONCAT to turn queries into custom prompts divided by funnel stages.

โ€œFind meโ€ is a commercial-driven prefix added to queries in the Consideration bucket (Funnel Stage header)

If you made it to this point, you may need to consider submitting these prompts to tracking platforms such as Peec AI โ€“ this is not a sponsored link!

Proprietary data may not portray the full picture

Using first-party search data is the only way to analyse your search queries and use the resulting insights as a baseline to draw a solid list of prompts. However, this is not without limitations.

The number of preprocessing steps involved can introduce confirmation bias, and prompts derived from query data rarely map cleanly to real ICPs or user personas.

Search data tells us what users ask, not how AI systems actually interact with our websites. To understand that layer, we need to move closer to the infrastructure itself.

In the next article, Iโ€™ll explore another approach based on server logs to improve AI search attribution that search data alone cannot reveal.

Summarise this post