More sites get on board the AI bot blocking train - a trend becoming more widespread, with larger sites going first.
Close to 20% of the world's top 1000 websites are now blocking AI training crawlers from their content, new research has shown.
Last week it was revealed that comparatively few of the most popular news sites were blocking such crawlers, such as GPTBot from OpenAI's ChatGPT, yet the new data shows that momentum is building and more sites across categories are implementing blocks.
The data, from AI detection company Originality.ai, shows that:
While general search engine crawling of sites is typically seen as desirable for publishers, so their content can be found by search engines and rewarding those who take good care of their site maps, AI crawling is a new phenomenon for most to have to consider.
The industry viewpoint seems to be hardening against it: there is no true understanding yet of the specifics of how AI systems will use or credit the crawled content in answers it may generate, and more generally how much it leverages the free publisher-created content and data to train and boost the effectiveness of those AI systems and - it follows - benefit from that content financially.
A new relationship is being formed between those who create content and those who wish to create content on the back of the original content. It would seem obvious that the former party would have the most control over the content, yet it seems that is precisely what is being fought over.