AI Search Tracking for ChatGPT Answers with Server Logs

January 31, 2026

Relying solely on search queries grounded in Search Console and Bing can highlight intent, but it may not fully capture broader patterns of engagement. To understand how AI systems access and display content, we need to look beyond queries and move closer to the infrastructure itself.

Traditionally, log files have been difficult for SEO teams to work with – often locked behind technical barriers and owned by infrastructure or security teams. It’s probably quicker to find a needle in a haystack than to earn access to server logs in an SEO role.

Fortunately, the rapid growth of AI platforms has changed this dynamic. As AI crawlers and agents increasingly interact with websites, their impact on infrastructure and cybersecurity has pushed log providers to make this data far more accessible and usable.

As ChatGPT dominate AI referrals, collecting proprietary server logs tied to the ChatGPT-User agent, with valid status codes, provides a strong starting point for shifting behavioural analyses from what users might ask to how AI systems actually access your site.

In this article, I’ll show how server logs can reveal AI access patterns, measure zero-click interactions, and provide actionable insights for improving visibility across both search engines and AI-driven platforms.

⚠️ Disclaimer

This is part 2 of a series about improving AI search tracking.
Catch up with the previous article on how your Google Search Console and Bing data can track AI search behavioural trails.

Table of Contents

This is not a prompt tracking framework

The only thing you should measure is how often your brand appears in LLM responses for your most representative topics

Does the LLM model correctly associate your brand with its marketed categories?

Granted that tracking tools provide inconsistent output, does the model consistently place your brand in the right ‘bucket’?

What You Need to Get Through

The main requirements to replicate this framework are:

Access to server logs – via web server, CDN (e.g; Akamai) or a third-party tool (i.e; Botify log analyzer) – I strongly suggest you get access to your web server or CDN.
Familiarity with Python and SQL – a piece of cake in the era of LLMs.

What is ChatGPT-User

This user agent is only used when ChatGPT answers a user’s question and decides to fetch your page to support its response.

So every hit from ChatGPT-User is a direct signal of user intent — it reflects real prompts happening inside ChatGPT.

ChatGPT-User is the “wiseman” behind all your queries

It tends to retrieve only the raw HTML and skip JavaScript altogether. So if key content appears only after JS execution or if metadata is injected dynamically, ChatGPT-User is unlikely to see it or use it in its output.

Allowing or disallowing ChatGPT-User from your website is not as straightforward. Recent updates in the OpenAI documentation confirmed that ChatGPT-User may no longer follow robots.txt rules as it might tandem with GPTBot or even be triggered by custom GPTS.

If you really want to block it from accessing your website to avoid using your content to furnish answers in chats, make sure you restrict the IP with either your hosting provider or at the CDN level with aggressive WAF.

Query your Log Files

Compose a SQL query retrieving your server logs for:

HTTP 200 and HTTP 304 (not modified)
response content type text/html to trigger only HTML requests
user agent ChatGPT-User
cliIP or the IP address used by ChatGPT-User to hit your endpoints
The last 7 days – a more retrospective outlook may exhaust bandwidth, depending on the configured data retention with your provider (usually last 30 days).

💡A few Heads-Ups
1. Use DNS checker to run IP lookups and identify potential bot impersonators. You can use this insight to limit the output of the following query to real ChatGPT-User IPs.
2. Bear in mind that all you need is not accuracy but a good sample.

Run this beautiful query:

 WITH base AS (
    SELECT
        reqPath,
        cliIP,
        reqTimeSec,
        transferTimeMSec,
        downloadTime,
        toUnixTimestamp64Milli(reqTimeSec) 
          - toUnixTimestamp64Milli(lagInFrame(reqTimeSec, 1)
            OVER (PARTITION BY cliIP ORDER BY reqTimeSec ASC)) AS time_diff
    FROM akamai.logs
    WHERE $__timeFilter(reqTimeSec)
      AND statusCode IN ('200', '304')
      AND reqHost = 'www.yoursite.com'
      AND rspContentType = 'text/html'
      AND UA LIKE '%ChatGPT-User%'
),
sessions AS (
    SELECT
        *,
        sum(
            CASE
                WHEN time_diff IS NULL THEN 1
                WHEN time_diff > 100 THEN 1   -- 100 ms threshold to capture full round trip of LLM requests*
                ELSE 0
            END
        ) OVER (
            PARTITION BY cliIP
            ORDER BY reqTimeSec ASC
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS session_id
    FROM base
)
SELECT
    cliIP,
    session_id,
    min(reqTimeSec) AS session_start_time,
    toUnixTimestamp64Milli(max(reqTimeSec)) - toUnixTimestamp64Milli(min(reqTimeSec)) AS duration_ms,
    COUNT(*) AS count_requests,
    COUNT(DISTINCT reqPath) AS unique_urls,
    groupArray(reqPath) AS reqpath
FROM sessions
GROUP BY
    cliIP,
    session_id
ORDER BY
    session_start_time DESC
 

This query is quite advanced, and I was able to put it together off the back of recent insightful findings from Ruben Remy

This is the expected output, exported as a CSV file:

What do you have in there?

session_id: a unique code that bundles simultaneous requests from ChatGPT-User into one specific event – occurring after 100 ms to create the “parallel bursts” that define a query fan-out event, according to Ruben’s research.
session_start_time: The exact moment ChatGPT-User began its parallel crawl for that specific question
duration_ms: Measures the session length, reflecting how quickly the AI “fanned out” across your site.
The longer a session, the greater the chances that ChatGPT-User fetches distinct URLs
count_requests: Total number of times ChatGPT-User touched your server in a single session
unique_urls: Total number of distinct pages fetched by the bot in one session.
A high number is a signal of topical authority: ChatGPT-User returned an answer based on a sequence of topically-related pages on your site.
reqPath: A list of pages that ChatGPT-User considers to be related to a single topic.

This is already an advanced deep dive into server logs.

This can easily suggest how many query fan-out cycles (total number of unique session_id) or some sort of indication around ChatGPT topical authority, based on the number of total distinct URLs fetched within one session.

URL Paths Pre-Processing & Clustering

But the project needs surgery.

So, the next step is pre-processing URL paths so we can convert them into traditional search queries and group them into clean clusters.

You can crack on with a Google Colab, as it covers this section.

FIRST MAKE A COPY OF THIS COLAB

The script will:

Normalise paths similar to search queries.
Perform HDBSCAN clustering and assign a score to each similar URL.

The choice of the number of clusters is arbitrary, but dependent on how many rows you managed to export from the above query.

If you end up with around 1000 rows, aim for 15-30 clusters – adjust MIN_CLUSTER_SIZE (20 is a good start)

Once fine-tuned to your decisions, run the script. A similar output will be returned.

The output file will be saved in a CSV file.

Make sure to open it and remove all rows where cluster_probability < 1 – this might contain outliers from irrelevant cluster assignments.

Also, remove all headers before normalised_path – from now on, we’re not going to need insights from server logs

One-Hot Encoding to Rename Clusters

Upload the document to Gemini, Claude or ChatGPT and ask to perform a simple NLP task, such as translating your HDBSCAN cluster scores into human-readable topic labels.

I’m not going to recommend one LLM or another – Gemini 3 is now the default model for AI Overviews and ChatGPT may handle content better than Claude.

However, turning numbers into words is a basic NLP operation that shouldn’t make a significant difference between models.

Prompt example:

Give me a non-generic content topic that represents all pages in each cluster. Populate the cluster_label column using the table copied from my clipboard.

As an example, I used Gemini

And copied and pasted the suggestions in the CSV file to ultimately operate a VLOOKUP.

That’s how I mapped out cluster numbers to their corresponding topic labels.

Now that you’ve got cluster, let’s tie them to reality.

Concatenate normalized_path with your site’s domain
Run a list crawl with the GSC API to gather Impressions and Clicks for concatenated URLs
Vlookup the findings against your original export (i.e; the last image above) to retain only Impressions (optionally clicks too).

Now, you have two options based on the strategic driver:

Strategy	Tactic	Target	Assignment
Knowledge Protection	Build or enhance reporting around zero-click searches and ChatGPT-User access	Internal team (SEO, Paid media)	Dump the keywords in Ahrefs to expand the range of similar long tail queries
Offensive Actions	Prompt building and tracking by funnel stages to intercept gaps in ChatGPT retrieval	Clients, company stakeholders and C-Suite	Review the list of paths and select a *keyword modifier* that specifies search intent for custom prompts

The second option is the most requested in SEO services. For the life of me, I wouldn’t subscribe, but I have to get paid somehow, so I will do!

Below is an example of how that tactic will render in practice based on the process detailed in this article.

At this point, you might consider submitting these prompts to tracking platforms such as Peec AI – again, this is not a sponsored link!

Server Logs are not without Limitations

In the previous article, we noted that proprietary data may not reflect the full picture, as the multiple preprocessing steps involved can introduce confirmation bias.

Similarly, this approach also comes with some caveats.

The link between the generated response and the referenced source is not always factual; in some cases, the citation is produced by template logic rather than true source attribution.
LLMs tend to provide sources or citations, even when they aren’t fully accurate. Tracking citations shows how confident the model is, not necessarily what’s correct.
This method doesn’t scale well unless you run it on your own server.
Pulling more than 30 days of logs can use a lot of resources and may trigger limits from your CDN—this happened to me when I overqueried.

AI Search Tracking moves from Attribution to Understanding

Tracking AI visibility is not about chasing perfect attribution. It is about building directional clarity in an ecosystem where traditional analytics no longer apply.

By combining first-party search data with server-side signals, you can move from speculation to structured observation — identifying patterns that explain how AI systems surface your brand.

This framework will not tell you exactly which prompt, topic, query or keyword triggered an answer, but it will help you understand why your content is being selected.

Simone De Palma

Simone De Palma is an SEO Manager at TUI and the founder of SEO Depths.

He graduated in Marketing and Management from Università IULM before completing a degree in Digital Marketing and Data Science at Leeds Beckett University.
Simone has worked as an SEO Specialist in digital agencies in Italy and the United Kingdom and he’s a contributor for the Search Engine Land and Majestic SEO podcast

When he’s away from his double screens, he enjoys cooling down with a refreshing swim at the pool. You could find him exploring art museums or enjoying the company of a classic romance

Add your content…

Advanced AI Search Tracking: Mapping out ChatGPT Answers with Server Logs

This is not a prompt tracking framework

What You Need to Get Through

What is ChatGPT-User

Query your Log Files

URL Paths Pre-Processing & Clustering

One-Hot Encoding to Rename Clusters

Server Logs are not without Limitations

AI Search Tracking moves from Attribution to Understanding

Simone De Palma

Summarise this post

Subscribe

Advanced AI Search Tracking: Mapping out ChatGPT Answers with Server Logs

This is not a prompt tracking framework

What You Need to Get Through

What is ChatGPT-User

Query your Log Files

URL Paths Pre-Processing & Clustering

One-Hot Encoding to Rename Clusters

Server Logs are not without Limitations

AI Search Tracking moves from Attribution to Understanding

Simone De Palma

Summarise this post

Share this: