AI bot blocking builds momentum

More sites get on board the AI bot blocking train - a trend becoming more widespread, with larger sites going first.

by Rob Corbidge

Published: 14:54, 04 September 2023

Close to 20% of the world's top 1000 websites are now blocking AI training crawlers from their content, new research has shown.

Last week it was revealed that comparatively few of the most popular news sites were blocking such crawlers, such as GPTBot from OpenAI's ChatGPT, yet the new data shows that momentum is building and more sites across categories are implementing blocks.

The data, from AI detection company Originality.ai, shows that:

GPTbot blocking increased from 9.1% on August 22 to 12% just one week later on August 29
The Common Crawl Bot (CCBot) is being blocked 6.77% of the time
CCBot is blocked about half as often as the GPTBot is blocked
No website in the Top 1000 is blocking Anthropic AI
18.6% of the Top 1000 websites are blocking at least one AI crawler

While general search engine crawling of sites is typically seen as desirable for publishers, so their content can be found by search engines and rewarding those who take good care of their site maps, AI crawling is a new phenomenon for most to have to consider.

The industry viewpoint seems to be hardening against it: there is no true understanding yet of the specifics of how AI systems will use or credit the crawled content in answers it may generate, and more generally how much it leverages the free publisher-created content and data to train and boost the effectiveness of those AI systems and - it follows - benefit from that content financially.

A new relationship is being formed between those who create content and those who wish to create content on the back of the original content. It would seem obvious that the former party would have the most control over the content, yet it seems that is precisely what is being fought over.

Rob Corbidge • Head of Content Intelligence

Rob Corbidge is Head of Content Intelligence at Glide Publishing Platform, applying the latest knowledge about advances and ideas in the publishing industry to our own product and helping clients get the most from their content.

Latest articles

Glide Publishing Platform, Glide CMS, Glide Go, and Glide Nexa are a suite of products which help publishers and media bring audiences and content together.

Middle East and Africa business and financial news site Zawya launches on Glide Publishing Platform

Glide Live: London 2026 - What the media industry's leaders are doing about AI, search, and revenue

Glide CMS is agent-ready by design. Here's what that means for publishers.

Ready to get started?

No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.

Book a demo