arrow Products
Glide CMS image Glide CMS image
Glide CMS arrow
The powerful intuitive headless CMS for busy content and editorial teams, bursting with features and sector insight. MACH architecture gives you business freedom.
Glide Go image Glide Go image
Glide Go arrow
Enterprise power at start-up speed. Glide Go is a pre-configured deployment of Glide CMS with hosting and front-end problems solved.
Glide Nexa image Glide Nexa image
Glide Nexa arrow
Audience authentication, entitlements, and preference management in one system designed for publishers and content businesses.
For your sector arrow arrow
Media & Entertainment
arrow arrow
Built for any content to thrive, whomever it's for. Get content out faster and do more with it.
Sports & Gaming
arrow arrow
Bring fans closer to their passions and deliver unrivalled audience experiences wherever they are.
Publishing
arrow arrow
Tailored to the unique needs of publishing so you can fully focus on audiences and content success.
For your role arrow arrow
Technology
arrow arrow
Unlock resources and budget with low-code & no-code solutions to do so much more.
Editorial & Content
arrow arrow
Make content of higher quality quicker, and target it with pinpoint accuracy at the right audiences.
Developers
arrow arrow
MACH architecture lets you kickstart development, leveraging vast native functionality and top-tier support.
Commercial & Marketing
arrow arrow
Speedrun ideas into products, accelerate ROI, convert interest, and own the conversation.
Technology Partners arrow arrow
Explore Glide's world-class technology partners and integrations.
Solution Partners arrow arrow
For workflow guidance, SEO, digital transformation, data & analytics, and design, tap into Glide's solution partners and sector experts.
Industry Insights arrow arrow
News
arrow arrow
News from inside our world, about Glide Publishing Platform, our customers, and other cool things.
Comment
arrow arrow
Insight and comment about the things which make content and publishing better - or sometimes worse.
Expert Guides
arrow arrow
Essential insights and helpful resources from industry veterans, and your gateway to CMS and Glide mastery.
Newsletter
arrow arrow
The Content Aware weekly newsletter, with news and comment every Thursday.
Knowledge arrow arrow
Customer Support
arrow arrow
Learn more about the unrivalled customer support from the team at Glide.
Documentation
arrow arrow
User Guides and Technical Documentation for Glide Publishing Platform headless CMS, Glide Go, and Glide Nexa.
Developer Experience
arrow arrow
Learn more about using Glide headless CMS, Glide Go, and Glide Nexa identity management.

How to block GenAI crawlers such as Google's Bard or OpenAI's ChatGPT from your website

Keep unwanted bots at bay by using the robots.txt file to tell AI crawlers you don't want them to scrape your content.

by Rob Corbidge
Published: 18:44, 04 October 2023

Last updated: 11:29, 11 October 2023
How to block GenAI crawlers such as Google's Bard or OpenAI's ChatGPT from your website
Overview

With the emergence of sophisticated AI technologies such as OpenAI's ChatGPT and Google's Bard, assisted by various web crawlers, the internet is awash with automated agents that engage in hoovering up your content. 

While these bots can be innovative and beneficial, GenAI's "take and ask for permission later" approach should be nipped in the bud. This article provides a guide on how to block ChatGPT and other AI crawlers from accessing a website.

Understanding the Bots

Before moving towards blocking strategies, it is important to understand how AI bots and crawlers work and how to identify them. 

Bots like ChatGPT may interact with web content through APIs or web scraping, while other generic web crawlers scan websites to index them for search engines or data retrieval purposes. Identifying them typically involves analysing user agent strings, IP addresses, or behavioural patterns.

Here we look mostly at a robots.txt which will signal to the crawlers to skip your website. No playing dumb if they don't!

How to block GenAI crawlers using Robots.txt

A robots.txt file instructs bots on how they should interact with the website. To block all bots or specific ones from accessing your entire site or specific sections, modify the robots.txt file.

Google, Bard, Google-Extended, Google-not-care

UPDATE: It looks like the blocker for Google's AI, Google-Extended, does NOT stop the new Google Search Generative Experience from scanning your content. The only way to fully block it is to go nuclear and fully block Google crawlers from your site - which means no search juice. Decisions decisions... or something the competition authorities should look at.

Read more


The Google-Extended instruction is below for completeness. 

User-agent: Google-Extended
Disallow: /

Robots.txt

A current selection of bots to block is as follows (please let us know of any additions via Contact Us page!)

--------------------------------------------

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: GPTBot
Disallow: /

--------------------------------------------