AI bot blocking builds momentum

By: Rob Corbidge, 04 September 2023

Tech hand stopping

More sites get on board the AI bot blocking train - a trend becoming more widespread, with larger sites going first.

Close to 20% of the world's top 1000 websites are now blocking AI training crawlers from their content, new research has shown.

Last week it was revealed that comparatively few of the most popular news sites were blocking such crawlers, such as GPTBot from OpenAI's ChatGPT, yet the new data shows that momentum is building and more sites across categories are implementing blocks.

The data, from AI detection company Originality.ai, shows that:

  • GPTbot blocking increased from 9.1% on August 22 to 12% just one week later on August 29
  • The Common Crawl Bot (CCBot) is being blocked 6.77% of the time
  • CCBot is blocked about half as often as the GPTBot is blocked
  • No website in the Top 1000 is blocking Anthropic AI
  • 18.6% of the Top 1000 websites are blocking at least one AI crawler

While general search engine crawling of sites is typically seen as desirable for publishers, so their content can be found by search engines and rewarding those who take good care of their site maps, AI crawling is a new phenomenon for most to have to consider. 

The industry viewpoint seems to be hardening against it: there is no true understanding yet of the specifics of how AI systems will use or credit the crawled content in answers it may generate, and more generally how much it leverages the free publisher-created content and data to train and boost the effectiveness of those AI systems and - it follows - benefit from that content financially.

A new relationship is being formed between those who create content and those who wish to create content on the back of the original content. It would seem obvious that the former party would have the most control over the content, yet it seems that is precisely what is being fought over.