What Server Logs Tell You About LLMS.TXT

Over the last few months, llms.txt has been positioned as the next must-have file for AI visibility. While Google and SEOs agree with testing llms.txt, the pending question is whether it affects crawler behaviour?

Based on first-hand server log analysis across a very small site (~100 pages LOL) and a large travel site (100K+ URLs Wohoo), the evidence so far suggests no measurable impact.

AI bots continue to prioritise robots.txt, HTML pages, and XML sitemaps, while requests for llms.txt remain sporadic and often driven by headless browsers rather than model-training agents.

This post breaks down what the logs show, why linking llms.txt in the <head> may offer limited upside, and where the file could introduce unwanted SEO and security risks at scale.

LLMS.TXT Evidence from Small Websites

At present, there is no evidence that llms.txt has any measurable impact on small websites.

On seodepths.com (just about 100 pages indexed by Google), server logs show that before and after adding llms.txt, the most frequently requested assets remain:

  • robots.txt
  • HTML pages
  • XML sitemaps

I’ve been monitoring logs from my WP server via PHP admin for the last 6 months, and I can confidently report that logs continued to show predictable patterns.

Even in the last 30 days:

  • Googlebot remains dominant
  • ChatGPT-User requests content sparingly
  • ClaudeBot, Perplexity, and Bingbot just occasionally

Adding a <link to the llms.txt in the head

A few in the industry have been advocating for this approach, and the effort appears to have paid off, with AI bots requesting the file more frequently.

And yet, despite adding a <link> reference to llms.txt in the <head> 3 months ago, the request patterns had not changed. In December, it actually zeroed out.

Find a request for a llms.txt bloody file – December 2025

I’m a bit on the fence with this approach.

It seems the recipe for a first-mover slop, with limited upside and the possibility of diluting brand positioning or exposing additional signals to competitors.

If logic serves me well, adding a <link in the <head> could expose the llms.txt until it becomes indexed. This means the link might be served in the SERP and linked to from other websites, which is exactly what happened on TUI with a llms.txt link aggregator.

In layman’s terms, users might stumble upon a meaningless wall of text, while making it even easier for the new arsenal of AI scrapers to probe your content.

The Role of the Robots.txt

On large websites, I noticed the robots.txt file was far more relevant to orchestrating AI model training and discovery.

New user agents from OpenAI and Common Crawl (CCBot) are designed to train machine models, including large language models.

Log analysis from a travel site with 100K+ URLs confirmed that AI-oriented user-agents were consistently looking up to robots.txt as their primary entry point.

Previous and last 30 days of Log Files โ€“ Travel company โ€“ More than 100K pages

Does the LLMS.TXT Impact AI Visibility?

Based on log data for my samples, the answer is currently no.

On a large travel site (100K+ pages indexed), llms.txt was requested only 10 times in 30 days.

The requesting agents were mostly headless Chrome instances, commonly associated with scraping rather than model training.

Server logs are affected by 30 days data retention. +100K URLs. Travel site

After all, user-agent strings are easy to fake. Scrapers, competitors, and security scanners often impersonate as Googlebot to bypass rate limits or ignore robots.txt restrictions.

Looking at their Client IP in the last 30 days, I found most were indeed impersonators of headless Chrome instances

To verify, you can use the CLI or a Reverse DNS lookup tool.

Here is an example of a headless Chromium user-agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.6367.207 Safari/537.36

LLLMS.TXT & Cybersecurity

And how you configure the implementation is what ultimately defines your SEO success with the file, as well as your cyber protection.

I’ve seen examples where the /llms.txt URL path was dynamically looped through various category pages.

You might argue that this is not a big deal; these pages would default to HTTP 404, and would be rarely requested by actual web crawlers over time (in the screenshot, you can still glance into the AI scrapers’ obsession over this file).

+100K URLs. Travel site

In fact, exposing or dynamically routing /llms.txt through multiple page templates may generate attractive entries to malicious AI agents or bots.

At scale, this behaviour can lead to downstream security incidents and increase the risk of resource exhaustion on your server.

Test with the /llms.txt file but do it carefully

You can experiment with llms.txt, but it should be configured with cybersecurity in mind.

I agree with the view from SEOs and Googlers: rather than over-discussing it with stakeholders, opting for a controlled deployment is usually enough.

The best way to implement your llms.txt file is:

  1. Generate the file using a dedicated tool such as Wordlift
  2. Manually place it in the root of the website

I would generally avoid plug-ins unless the setup is extremely simple (e.g. basic WordPress).

For more complex back-ends, a manual and deliberate implementation is the safer and more appropriate option.

Summarise this post