Tracking ChatGPT Backlinks and Traffic Referrals with Python

I’ve been quiet for some time in the public space. Loads of client work to deliver and some genuine deep dives into the weeds of the LLMs Information Retrieval (IR) and their RAG systems.

So I’ve done my homework and I found a way to potentially scale ~~keyword~~ “prompt research” to turn raw JSON outputs from LLMs conversations into actionable information such as:

Brand mentions (i.e; backlinks – should we reword this?) in the LLM space. This is available for as many prompts as you want, provided you do some super-duper data preprocessing.
Traffic referrals – these are channels the URL mentioned in the output of one/multiple prompts are reported in the LLM.
Other meta data extraction, including title and favicon URL. You will get this info plain and simple from Perplexity and Claude
Related Queries: Suggested follow-up questions after the main prompts has been grounded in searchable query. Well, in this space you will find this treat limited to Perplexity.

⚠️ On related queries from ChatGPT, there have been a blossom of bookmarklets and chrome extension that just do that, including ChatGPT Path from Ayima which I’ll serve to get the raw structured data from the LLMs in this tutorial.

You can try the app here:
👉🏻 ChatGPT Mentions & Traffic Referrals Analysis (Streamlit)

And now that you’re here, why don’t you browse my toolstation for more SEO related tools?

Table of Contents

Getting Started: How to Prepare Your Data

Here’s how to prep your file before uploading it to the app.

First, run a search on ChatGPT using the Search Mode:

Then, open the ChatGPT Path Chrome extension:

Export the conversation:

Now clean the resulting CSV or Excel file:

Remove the top rows (up to the first “Source” line).
Delete columns like Index and Timestamp
Replace any column named Type with Prompt — this is the actual question you asked ChatGPT.

Then save the cleaned file as .xlsx

How to Use the App

Once you have your XLSX file, upload it to the Streamlit app.

Leave the search box empty to get an overview of all the number of domains mentioned and a breakdown of the traffic source referrals.

Next, type your brand name or domain in the search box. The app will filter the view to show you only the citations where your brand was mentioned:

Scroll down and you’ll get summary stats:

And if you keep scrolling, you’ll find a full breakdown of the raw data table:

Caveats & Things to Note

Traffic sources classification is rule-based. This means that if ChatGPT cites a domain not covered in the logic, it might not be correctly classified. I’m providing the Github repository with the original python code if you’d like to tweak it to your advantage.

Live updates are supported. You can change the domain in the search bar and see the data update in real time.

A heads up on attribution – analytics platforms like GA4 might not report ChatGPT traffic accurately and it looks like this is where we’re at in the industry at the moment, so you’d better make peace with the fact that this sort of chaos can’t be handled. Just not today.

ChatGPT and other LLM search engines may generate non-existent URLs. This is due to what people inaccurately refer to “hallucinations” — as if they were conscious or something humans alike.

The point is the the probability distribution of these beasts is prone to get sidetrack the longer the prompt is. Also a fair share of error is ominpresent in their RAG (retrieval augmented generation) pipeline as well.

All this boring stuff to invite you to make sure a cited URL actually exists before taking action.

What About Perplexity and Claude?

You can also use this same method to track how your brand is being cited in other LLMs like Perplexity and Claude.

They operate a little differently from ChatGPT but share the same probabilistic and RAG architecture. The main difference is how they structure their JSON output — which is exactly what we tap into.

For example, all the structured information useful for SEO in Claude are nested at the second node level after chat_messages and content

While in Perplexity, the raw JSON stands out straight from the nomenclature of the nodes, with the actual SEO information sitting after entries > blocks > plan_block

Luckily, both platforms let you extract structured data using browser bookmarklets.

💡 And by the way you can fetch structured data in bulk using Python

Here’s a bookmarklet to extract JSON data from Perplexity:

javascript:(async()=>{const s=(location.pathname.match(/\/search\/([^/?#]+)/)||[])[1];if(s){const t=Date.now();const q=`with_parent_info=1&with_schematized_response=1&from_first=1&version=2.18&source=default&limit=100&offset=0&supported_block_use_cases=answer_modes&supported_block_use_cases=media_items&supported_block_use_cases=knowledge_cards&supported_block_use_cases=inline_knowledge_cards&_t=${t}`;const r=await fetch(`/rest/thread/${s}?${q}`,{credentials:'include',cache:'no-cache'});if(r.ok){const d=await r.json(),u=URL.createObjectURL(new Blob([JSON.stringify(d,null,2)]));Object.assign(document.createElement('a'),{href:u,download:`perplexity-${s}.json`}).click();setTimeout(()=>URL.revokeObjectURL(u),2e3);}}})();

And here’s one for Claude:

javascript:(async()=>{try{const c=location.pathname.match(/\/chat\/([^/]+)/)?.[1];if(!c){alert('Open%20a%20Claude%20chat%20first');return;}const t=Date.now();const o=(await(await fetch(`/api/organizations?_t=${t}`,{credentials:'include',cache:'no-cache'})).json())[0].uuid;const j=await(await fetch(`/api/organizations/${o}/chat_conversations/${c}?tree=true&rendering_mode=messages&render_all_tools=true&_t=${t}`,{credentials:'include',cache:'no-cache'})).json();const u=URL.createObjectURL(new Blob([JSON.stringify(j,null,2)],{type:'application/json'}));Object.assign(document.createElement('a'),{href:u,download:`claude-${c}-rich.json`}).click();setTimeout(()=>URL.revokeObjectURL(u),2000);}catch(e){alert('Could%20not%20fetch%20rich%20conversation%20JSON');console.error(e);}})();

I promise I will drop a link in favour of the primary sources of the above bookmarklets. I can’t backtrack them.

I’m so sorry, get in touch if you created them.

Once you’ve fired up the bookmarklets in either Perplexity or Claude, load the JSON export into one or the following

And here are the code repositories on Github

Hope this is helpful, if you have any queries or any complaint feel free to get in touch!

Simone De Palma

Technical SEO Specialist
Simone De Palma is an SEO Specialist at Omnicom and the founder of SEO Depths.

He graduated in Marketing and Management from Università IULM before completing a degree in Digital Marketing and Data Science at Leeds Beckett University.
Simone has worked as an SEO Specialist in digital agencies in Italy and the United Kingdom and he’s a contributor for the Search Engine Land.

When he’s away from his double screens, he enjoys cooling down with a refreshing swim at the pool. You could find him exploring art museums or enjoying the company of a classic romanc

Add your content…

How I used Python and Streamlit to track down ChatGPT Brand Mentions and Traffic Referrals