arrow Products
Glide CMS image Glide CMS image
Glide CMS arrow
The powerful intuitive headless CMS for busy content and editorial teams, bursting with features and sector insight. MACH architecture gives you business freedom.
Glide Go image Glide Go image
Glide Go arrow
Enterprise power at start-up speed. Glide Go is a pre-configured deployment of Glide CMS with hosting and front-end problems solved.
Glide Nexa image Glide Nexa image
Glide Nexa arrow
Audience authentication, entitlements, and preference management in one system designed for publishers and content businesses.
For your sector arrow arrow
Media & Entertainment
arrow arrow
Built for any content to thrive, whomever it's for. Get content out faster and do more with it.
Sports & Gaming
arrow arrow
Bring fans closer to their passions and deliver unrivalled audience experiences wherever they are.
Publishing
arrow arrow
Tailored to the unique needs of publishing so you can fully focus on audiences and content success.
For your role arrow arrow
Technology
arrow arrow
Unlock resources and budget with low-code & no-code solutions to do so much more.
Editorial & Content
arrow arrow
Make content of higher quality quicker, and target it with pinpoint accuracy at the right audiences.
Developers
arrow arrow
MACH architecture lets you kickstart development, leveraging vast native functionality and top-tier support.
Commercial & Marketing
arrow arrow
Speedrun ideas into products, accelerate ROI, convert interest, and own the conversation.
Technology Partners arrow arrow
Explore Glide's world-class technology partners and integrations.
Solution Partners arrow arrow
From data and analytics to SEO and design consultancies, tap into Glide's solution partners and worldwide sector experts.
Industry Insights arrow arrow
News
arrow arrow
News from inside our world, about Glide Publishing Platform, our customers, and other cool things.
Comment
arrow arrow
Insight and comment about the things which make content and publishing better - or sometimes worse.
Newsletter
arrow arrow
The Content Aware weekly newsletter, with news and comment every Thursday.
Knowledge arrow arrow
Customer Support
arrow arrow
Learn more about the unrivalled customer support from the team at Glide.
Documentation
arrow arrow
User Guides and Technical Documentation for Glide Publishing Platform headless CMS, Glide Go, and Glide Nexa.
Developer Experience
arrow arrow
Learn more about using Glide headless CMS, Glide Go, and Glide Nexa identity management.

Will you stop AI's content harvest?

OpenAI have now specified how to prevent ChatGPT's bot from crawling your site. Google shows no such inclination to do the same. 

by Rob Corbidge
Published: 12:37, 10 August 2023

Rob Corbidge is Head of Content Intelligence at Glide Publishing Platform, applying the latest knowledge about advances and ideas in the publishing industry to our own product and helping clients get the most from their content.

Harvesting by Stable Diffusion

A unified approach from publishers to the bots that crawl our content in order to feed the immense data appetite of generative AI is yet to be settled upon.

Some moves are afoot to establish "a position" the industry can get behind, yet I can tell you that in searching for news about such moves, Google surfaced an article poorly re-written by AI, second result, ripped from the Australian site Crikey. 

Second result. With Crikey's own editorial mentions of its name still present in the text. Whoever https://fagenwasanni.com/ are, they've gamed the hell outta Google, for now.

And that's what we're up against. And publishers are still moving slowly.

So how to prevent such free content harvest?

This week, OpenAI once again likely demonstrated the technology "moat" it still has against its rivals with ChatGPT being quite some distance ahead in the development cycle.

It did this by explaining how to prevent the ChatGPT crawler, GPTBot, from accessing the content of a site, either wholly or partially.

We would read this as OpenAI deciding they have quite enough data from what they've already gathered, largely for free of course, and can now concentrate on finessing their product. There's likely a point at which an established generative AI doesn't benefit from continued mass data digestion, which also points I think to the likelihood that publishers who choose to engage with generative AIs will be more focused in their content access rules rather than throw open the gates.

Indeed, OpenAI have actually been reasonably careful in not making enemies, or not too many enemies, over content. A move last month saw them suspend a search feature allowing ChatGPT to search the web via Bing after it allowed users to get around paywalls.

Google shows no such inclination at present. It may well be that Bard requires more data still, or that Google has set all its switches to 11 in order to catch up with OpenAI and they just don't care.

OpenAI's move does bring up a decision that must be made: to allow GPTBot, or not to allow GPTbot at all, or to partially allow it? By allowing any access at all, is a publisher making itself hostage to legislative fortune? By which we mean some kind of content use law being passed eventually in whatever jurisdiction you reside.

Not a great gamble that, as we know.

My limitations as one who thinks from the content production of the equation have been evident in some responses I've read from marketeer types, who, in their pace and fury, can see a use for a brand allowing generative AI content harvesting bots access to at least some content. 

"You want to guarantee these systems say the right things about your brand" was a sample response. 

Here we are going into blurry territory. Publishers have learned to regard platforms with, if not hostility, certainly suspicion. If the thinking is that a brand, be it publisher or other, has to at least try and control what generative AI produces about it, that's understandable. 

However, if we regard such systems as being similar to the platforms, and therefore ultimately only self-interested and liable to change the rules at any time, then placing your fate in their hands in any way is a risky business.

Generative AI's effect in the sphere of visual media has been felt and addressed much earlier. Court cases are already underway, such as Getty's claiming that Stability AI misused more than 12 million images.

Of course those in the field of visual media have been swamped by an everything-is-free tsunami before and so have been quick to act this time. Media veterans will recall a period when it seemed that images that had taken time, money, and skill to produce were being copied and used left, right, and centre. Many good photographers, designers, and artists were badly burned by it. 

It's early days with generative AI. But we've learned that the early days are the ones where they tech giants grab all the best territory. 

Latest articles

The Google cake we all wanted has turned up - on fire and full of lies
The Google Privacy Sandbox debacle could be the straw that breaks the company's back
arrow button
Journalism under surveillance takes a new turn as OpenAI asks to see your notebooks
OpenAI's dystopian hello to journalists and publishers
arrow button
a person running away from technology
Quit running from news: fear of fakery is greater than the fake itself
arrow button

Ready to get started?

No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.

Book a demo