arrow Products
Glide CMS image Glide CMS image
Glide CMS arrow
The powerful intuitive headless CMS for busy content and editorial teams, bursting with features and sector insight. MACH architecture gives you business freedom.
Glide Go image Glide Go image
Glide Go arrow
Enterprise power at start-up speed. Glide Go is a pre-configured deployment of Glide CMS with hosting and front-end problems solved.
Glide Nexa image Glide Nexa image
Glide Nexa arrow
Audience authentication, entitlements, and preference management in one system designed for publishers and content businesses.
For your sector arrow arrow
Media & Entertainment
arrow arrow
Built for any content to thrive, whomever it's for. Get content out faster and do more with it.
Sports & Gaming
arrow arrow
Bring fans closer to their passions and deliver unrivalled audience experiences wherever they are.
Publishing
arrow arrow
Tailored to the unique needs of publishing so you can fully focus on audiences and content success.
For your role arrow arrow
Technology
arrow arrow
Unlock resources and budget with low-code & no-code solutions to do so much more.
Editorial & Content
arrow arrow
Make content of higher quality quicker, and target it with pinpoint accuracy at the right audiences.
Developers
arrow arrow
MACH architecture lets you kickstart development, leveraging vast native functionality and top-tier support.
Commercial & Marketing
arrow arrow
Speedrun ideas into products, accelerate ROI, convert interest, and own the conversation.
Technology Partners arrow arrow
Explore Glide's world-class technology partners and integrations.
Solution Partners arrow arrow
From data and analytics to SEO and design consultancies, tap into Glide's solution partners and worldwide sector experts.
Industry Insights arrow arrow
News
arrow arrow
News from inside our world, about Glide Publishing Platform, our customers, and other cool things.
Comment
arrow arrow
Insight and comment about the things which make content and publishing better - or sometimes worse.
Newsletter
arrow arrow
The Content Aware weekly newsletter, with news and comment every Thursday.
Knowledge arrow arrow
Customer Support
arrow arrow
Learn more about the unrivalled customer support from the team at Glide.
Documentation
arrow arrow
User Guides and Technical Documentation for Glide Publishing Platform headless CMS, Glide Go, and Glide Nexa.
Developer Experience
arrow arrow
Learn more about using Glide headless CMS, Glide Go, and Glide Nexa identity management.

Could OpenAI be forced to wipe its training data?

The gloves could come off as the NYT prepares to get militant over its content being used by what it clearly sees as a competitor

by Rob Corbidge
Published: 11:45, 17 August 2023

Rob Corbidge is Head of Content Intelligence at Glide Publishing Platform, applying the latest knowledge about advances and ideas in the publishing industry to our own product and helping clients get the most from their content.

A robot made of newspaper by Stable Diffusion

Raising the prospect of OpenAI having to scrub its training data clean of content it has taken without permission, lawyers for the New York Times are exploring the possibility of taking the ChatGPT company to court to protect the NYT's intellectual property rights, NPR has reported.

The key thing that has occurred is that the NYT obviously considers OpenAI a competitor. They are not being sucked in by the "if you don't participate you won't benefit" talk. 

If can you recall back in the full-on blizzard-of-change days, when the future of publishing was what happened next week, then you'll remember the NYT adopting a maxim of "we will survive this if no one else does" at the boardroom level. It's served them well.

This NPR leak is likely a negotiating tactic of course. No one really wants to go to litigation, but if it's necessary then the NYT are showing OpenAI they have options in their armoury, and likely more friends in Washington too.

As the well-sourced NPR report says "For weeks, The Times and the maker of ChatGPT have been locked in tense negotiations over reaching a licensing deal in which OpenAI would pay The Times for incorporating its stories in the tech company's AI tools, but the discussions have become so contentious that the paper is now considering legal action." 

Will this be the seismic battle between those that create content, and those are looking to create content for free on the back of someone else's content, and then sell it back to them? Can the Generative AI juggernaut be made to live inside boundaries that are beneficial to the content creators and owners that the technology requires to thrive?

Paraphrasing its source, NPR reported that chief among the concerns senior NYT figures have is that ChatGPT is becoming a direct competitor with them by generating text that "answers questions based on the original reporting and writing of the paper's staff".

The prospect of OpenAI having to delete its dataset is also raised, as if "OpenAI is found to have violated any copyrights in this process, federal law allows for the infringing articles to be destroyed at the end of the case".

What happens in the US is under US law of course, but any action by the NYT should have wider consequences, in the very least at boosting the morale of any publisher and demonstrating that the correct way to to think about the Generative AI businesses is that they are rivals. 

It's worth noting that the current proposed EU legislation around Generative AI proposes "publishing summaries of copyrighted data used for training".

The legal threat from the NYT comes as this week Google has detailed more SGE features, essentially next generation AI-assisted search. Key among those is an article summary feature that allows users, once on a page, to press a button to get a summary of the content. It's designed to work "only on articles that are freely available to the pubic on the web".

It's their summary. Not your summary. A publisher's summary is designed to compliment the content, Google's looks like replacing.

Is there some dwell time diminution competition I'm not aware of? I can't help but feel Google just want to be seen doing something. In the example used by Google, a key points summary of an article about the famed American highway, Route 66, was used. SGE delivered a wonderfully short set of bullet points.

Yet Route 66 is now largely a tourist route, a route to be taken to enjoy the journey and scenery, and dwell on the automobile's powerful place in US history. Why would you want such a summary unless you're a professional bore, or are hoping to win the prize ham at next week's pub quiz?

By your theoretical use cases so shall you know you your product?

Latest articles

The Google cake we all wanted has turned up - on fire and full of lies
The Google Privacy Sandbox debacle could be the straw that breaks the company's back
arrow button
Journalism under surveillance takes a new turn as OpenAI asks to see your notebooks
OpenAI's dystopian hello to journalists and publishers
arrow button
a person running away from technology
Quit running from news: fear of fakery is greater than the fake itself
arrow button

Ready to get started?

No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.

Book a demo