World's leading news sites yet to block AI crawlers

By: Rob Corbidge, 30 August 2023

a spider inside a test tube with the top closed, 1960s science fiction movie

Study of the world's 50 leading news sites shows that the vast majority may have not taken on action on blocking

Most leading news sites have yet to take action on blocking AI crawlers, according to a new and ongoing study.

From 50 major international sites, only 11 have actively blocked GPTBot at the robots.txt. level. In total, 14 sites had a "block any bot" rule in place.

The picture is little different when drilling down to regions, with three of 21 UK leading sites enacting a block on the OpenAI crawler. Eleven of the top 50 US sites have introduced the block.

OpenAI detailed earlier this month how sites could block their crawler, if they desired, however adding: "Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety." 

Gary Kirwan of Kirwan Digital who undertook the study,  has placed all the data in a publicly available spreadsheet. He notes that as an ongoing study, he has noticed more sites introducing a block in recent days.

He has also tracked blocks placed on CCBot and Anthropic AI's crawler.

"Publishers are now having to deal with these new types of bots scraping their data for large language models that a publisher gains no direct benefit from that I'm aware of," Kirwan said, speaking to GPP.

Kirwan stresses that it is possible to block such AI crawlers by other methods, and he has no data for those. However, given the technical resources of those have introduced a block, it is safe to assume that they know what they are doing.