As AI platforms emerge as new traffic channels, search behaviour is more fluid and nuanced. Unlike traditional search engines, AI visibility is harder to measure as AI responses are generated in real time.
Prompt tracking is often mentioned, but common guidance on attribution is scarce.
The key point is that prompts are synthetic data projections based on actual search queries, floating around closed systems.
LLMs fine-tune models based on degrees of randomness (i.e; temperature) and return AI responses based on users’ location and past conversations.
If tracking is bound to far-fetch, rather than focusing on directional and synthetic prompts, analysing behavioural patterns behind them proves more factual.
In this article, Iโll show you how to build an advanced framework to tackle zero-click searches and guide your decisions on AI visibility by integrating Google Search Console and Bing data.
This is not a prompt tracking framework
The only thing you should measure is how often your brand appears in LLM responses for your most representative topics.
- Does the LLM model correctly associate your brand with its marketed categories?
- Granted that tracking tools provide inconsistent output, does the model consistently place your brand in the right ‘bucket’?
What You Need to Get Through
The main requirements to replicate this framework are:
- Access to Big Query. You can get started with this guide from Datacamp
- Google Search console connected natively to Big Query. If you’re in doubt, the Google’s support can help you get started.
- Bing API connected to Big Query via a Cloud scheduler. This guide can help you kick off but I highly recommend you have it set up by a proper engineer at your company*
- Becoming familiar with SQL, SQLX and Dataform โ this is a piece of cake in the era of LLMs
* This step is very advanced โ it requires in-depth knowledge of Google Cloud and service accounts, scheduling Cron jobs in Python or JavaScript. Leave it to a specialist, trust me.
Google Search Console, Bing & BigQuery
Google Search Console provides valuable first-party insights into how your site appears in Google Search โ including queries, impressions, clicks, and page performance.
However, their limitations around data sampling, thresholding and anonymised queries are notorious
Still, this can be solved by sending your data flow straight to Big Query.
As Marco Giordano highlights in his work on web analytics, relying on BigQuery exports rather than the Google Search Console interface unlocks several key benefits:
Main advantages:
- Access to data beyond the 16-month limit (finally, historical data)
- Recovery of visibility lost to anonymised queries
- Significantly higher volumes of clicks and impressions data
A similar story applies to your Bing Webmaster Tools.
While itโs best to pull Bing data through BigQuery for large-scale analysis, it remains a valuable source to complement our initial dataset a comprehensive view of your website’s search demand.
Here are a few reasons why:
- ChatGPT and Bing are owned by the same parent company (Hello Microsoft!), and ChatGPTโs pre-training dataset draws heavily from Bing data.
- Bing provides visibility into queries that Google anonymises due to privacy thresholds.
Feeding ChatGPT agents โ recent evidence suggests agents use Bing’s API for results 92% of the time - Web searches performed by LLMs via the Bing API may not be included in the Bing Webmaster Tools interface.
Building a BigQuery table based on Google Search Console and Bing API
| Goal | Build unified search intelligence by consolidating Google and Bing data into incremental marts, and join datasets to concatenate queries from both sources |
| Objective | Identify zero-click queries that signal high-intent research behaviour across the customer journey. |
| Leading Indicator (i.e Metric) | Impressions by query length โhighlighting high-visibility searches that generate demand without driving clicks. |
| Touchpoints | Google AI Mode, AI Overviews and and conversational platforms such as ChatGPT |
| Requirements | First-party search data from Google Search Console and Bing, integrated via BigQuery. |
Pre-processing considerations
When building your table, Ramon Eijkemans suggests preprocessing your query data.
This is not only about reducing processing costs, but also excluding irrelevant terms, such as queries containing โloginโ.
The same consideration applies to branded queries. Depending on how well your brand is recognised by search enginesโand, by extension, by LLMsโyou may or may not want to include them in your dataset.
Labelling queries as branded or non-branded from the start provides a more balanced approach, but it requires additional engineering work in BigQuery and Google Cloud.
Conversely, prioritising non-branded queries can help surface high-intent queries with untapped SEO potential.
For this blog post, Iโll focus on capturing non-branded terms using a JavaScript function.
Paste the following into a new tab in Big Query and make sure to reference your project ID.
CREATE OR REPLACE FUNCTION `your_project-140515.search_console.clean_text`(input STRING)RETURNS STRING LANGUAGE js AS R"""OPTIONS(location='europe-west3');if (input === null) return null;const accentsMap = { 'รง':'c','รฆ':'ae','ล':'oe','รก':'a','รฉ':'e','รญ':'i','รณ':'o','รบ':'u', 'ร ':'a','รจ':'e','รฌ':'i','รฒ':'o','รน':'u','รค':'a','รซ':'e','รฏ':'i', 'รถ':'o','รผ':'u','รฟ':'y','รข':'a','รช':'e','รฎ':'i','รด':'o','รป':'u', 'รฅ':'a','รธ':'o','รฑ':'n', 'ร':'C','ร':'AE','ล':'OE','ร':'N'};let normalizedText = input .split('') .map(c => accentsMap[c] || c) .join('');let processedText = normalizedText.toLowerCase();// remove non-asciiprocessedText = processedText.replace(/[^\x00-\x7F]/g, '');// remove phone-like numbersprocessedText = processedText.replace(/\+?\b(?:\d[ -]?){9,}\b/g, ' ');// remove branded terms processedText = processedText.replace(/\b(your|regex, '');// clean up extra spacesprocessedText = processedText.replace(/\s+/g, ' ').trim();return processedText;"""; That would return the following example under Routines.
Preparing Search Queries from Google Search Console
Head to Dataform and create incremental marts to merge queries from the GSC searchdata_url_impression dataset with your Bing API dataset.
Weโre going to create a project in Dataform and generate an incremental view using SQLX.
First, create a repository
Next, initialise a workspace
Now we’ll create a mart.
Create a file that reads like definitions/incremental_ai_search.sqlx
Paste in the following SQLX query
config { type: "incremental", uniqueKey: ["date", "query"], bigquery: { partitionBy: "date", clusterBy: ["query"] }}WITH source AS ( SELECT data_date AS date, `your_project-140515.searchconsole_uk.basic_text_preprocess`(query) AS query, ARRAY_LENGTH( SPLIT( TRIM(`your_project-140515.searchconsole_uk.basic_text_preprocess`(query)), ' ' ) ) AS word_count, impressions, clicks, sum_position, country FROM `your_project-140515.searchconsole_uk.searchdata_url_impression` WHERE Is_anonymized_query = FALSE AND country = 'gbr'),aggregated AS ( SELECT date, query, word_count, country, SUM(impressions) AS total_impressions, SUM(clicks) AS total_clicks, SUM(sum_position)/SUM(impressions) + 1 AS avg_position, FROM source GROUP BY date, query, word_count, country)SELECT *FROM aggregatedWHERE word_count BETWEEN 10 AND 20 AND avg_position < 3 AND total_clicks <= 10 If you run it momentarily, this will output the following example with your Search Console queries
Clicking on Start execution will run the following workflow.
If you click on the link next to Action destination, you will be able to locate the resulting dataset.
Congrats! this is your GSC incremental mart in Dataform.
Preparing Search Queries from Bing
If you managed to schedule your Bing data flow through their API correclty into BigQuery, you can move on with this.
Replicate the pre-processing code above, but be sure to use your Bing webmaster project ID as the point of reference.
Next up, create the incremental view:
config { type: "incremental", uniqueKey: ["date", "query"], bigquery: { partitionBy: "date", clusterBy: ["query"] }}WITH SOURCE AS ( SELECT Date AS date, `your_project-140515.bing_webmaster.clean_text`(query) AS query, ARRAY_LENGTH(SPLIT(TRIM(`your_project-140515.bing_webmaster.clean_text`(query)), ' ')) AS word_count, impressions, clicks FROM `your_project-140515.bing_webmaster.bing_query_page` WHERE query IS NOT NULL AND query != '' ), aggregated AS ( SELECT date, query, word_count, SUM(impressions) AS total_impressions, SUM(clicks) AS total_clicks, SAFE_DIVIDE(SUM(clicks), SUM(impressions)) AS avg_ctr, TIMESTAMP(date) AS last_seen FROM SOURCE GROUP BY date, query, word_count )SELECT *FROM aggregatedWHERE word_count BETWEEN 10 AND 20 AND total_clicks <= 10 You will end up with something similar from your Dataform view.
Clicking on Start execution will run the following workflow.
If you click on Action destination, you will be able to locate your dataset, which weโll use later to join the Search Console dataset.
Concatenate Google Search Console to Bing API Queries
At this point, you will combine Google and Bing search data into a single table.
The new table will contain a ‘source‘ column to identify which search engine each search query came from.
Create another incremental mart based on a UNION ALL command in Dataform called definitions/ai_bing_gsc_merge.sqlx
config {
type: "incremental",
schema: "dataform",
tags: ["search", "incremental"],
uniqueKey: ["query", "source", "date"],
bigquery: {
partitionBy: "date",
clusterBy: ["source", "query"]
}
}
SELECT
date,
query,
word_count,
total_impressions,
total_clicks,
'google' AS source
FROM ${ref("incremental_ai_search")}
UNION ALL
SELECT
date,
query,
word_count,
total_impressions,
total_clicks,
'bing' AS source
FROM ${ref("incremental_ai_search_bing")} The table updates incrementally (only new data) and is partitioned by date for faster queries.
Clicking on Start Execution will trigger the pipeline with the details view confirming the job was successful.
Visiting the new dataset from the link next to Action Destination, you will see the resulting dataset with queries divided by source (i.e, Bing and Google)
๐กPro Tip – Not all long-form queries are factual searches
The discovery of America, anyway…
The merged dataset may contain terms inflated by synthetic queries generated by third-party prompt tracking tools.
Many AI platforms require users to select a country and language for tracking. Some even automatically suggest related terms.
By pre-configuring prompts, these tools may append artificial modifiers to the original queries, trigger requests from their own IP addresses, and then extrapolate the results.
The main reason these synthetic query clusters appear in your data is that some tools rely on unreliable proxy networks, which fail to mask their traffic.
Common examples are country and language specifications (e.g; โin the United Kingdomโ, โreply in Englishโ)
Cluster by Funnel Stage and Export
You could store the outcome in your Google Cloud profile by clicking the Export button
However, it’s best to query the dataset on demand to break down costs.
A smart approach is to map out search queries by funnel stage.
The following query maps out transactional search terms.
SELECT query, ANY_VALUE(word_count) AS word_count, SUM(total_clicks) AS total_clicks, SUM(total_impressions) AS total_impressions, ANY_VALUE(source) AS sourceFROM `your_project.dataform.ai_bing_gsc_merge`WHERE-- Partition pruning (adjust dates as needed)date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)--Filter by intentAND REGEXP_CONTAINS( LOWER(query), r'^(give me|step by step|buy|purchase|pricing|price|cost|deal|discount|cheap|last minute|checkout|book me|book|check)\b' ) --Filter out brand terms AND NOT REGEXP_CONTAINS( LOWER(query), r'(brand|Brand|bRAnd\2)' )GROUP BY queryORDER BY total_impressions DESC;--- Information ^(what|what is|who|why|when|how does|how do|how much|how long|explain|guide|provide a)\b.*--- Consideration ^(show me|find me|help me find|best|list|review|ratings|top|leading|recommended|alternatives to|competitors of)\b.*--- Transaction ^(give me|step by step|buy|purchase|pricing|price|cost|deal|discount|cheap|last minute|checkout|book me|book|check)\b.* Despite relatively cheap, querying is not free but it allows us to export smaller files for freeโoften under 1 GB.
Within a CSV export, all you have to do is remove duplicates and create the Funnel Stage column.
Use Cases of Long Form Queries from your Google Search Console and Bing Dataset
This framework provides a solid understanding of your zero-click searches, impacting your organic performance across Google Search Console and Bing.
Specific use cases include:
- Create detailed FAQs โ Identify the questions users ask when they want more than the quick answers shown in AI Overviews or AI mode.
- Tap into new channels โ Google is no longer the only destination. Share these insights with relevant teams to support stronger, more diverse content strategies.
- Work with Digital PR โ Brand mentions and earned media are especially important for zero-click searches. Share this data with the PR team, grouped by search intent, to encourage stronger coverage and quality citations.
- Support PPC efforts โ Provide the paid team with search-intent insights so they can focus spend on high-intent keywords, test long-tail opportunities, and use remarketing to re-engage users who didnโt convert.
Prompt Emulation (for Survival)
While this framework shouldn’t serve prompt tracking, you might have to respond to similar requests from clients or the C-board, and sometimes it’s better to just work head down to bring bread to the table.
Once the labelling on the CSV above is complete, you can simply use a =CONCAT to turn queries into custom prompts divided by funnel stages.
If you made it to this point, you may need to consider submitting these prompts to tracking platforms such as Peec AI โ this is not a sponsored link!
Proprietary data may not portray the full picture
Using first-party search data is the only way to analyse your search queries and use the resulting insights as a baseline to draw a solid list of prompts. However, this is not without limitations.
The number of preprocessing steps involved can introduce confirmation bias, and prompts derived from query data rarely map cleanly to real ICPs or user personas.
Search data tells us what users ask, not how AI systems actually interact with our websites. To understand that layer, we need to move closer to the infrastructure itself.
In the next article, Iโll explore another approach based on server logs to improve AI search attribution that search data alone cannot reveal.