Ready to get started?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demoThe giants of AI have all our attention when it comes to dealing with content theft, but it's the micro factories of plagiarism and theft who are arguably doing more damage today.
While the global focus of the publishing industry and some legislators is at the macro end of The Great Training Robbery and how to deal with the LLM juggernaut ploughing through entire cultural outputs of creativity while ignoring the concept that you pay for what you use, publishers are also under attack from AI at the micro end of the scale, and which might be harder to legislate for.
For each supersized Big AI using copyrighted data to "train" their megasized models, there are possibly thousands or even tens of thousands of individual actors using AI tools to steal content with as much if not more malign intent.
And what is occurring is now so technically straightforward to do, it's almost painful to consider as a publisher, many of whom have seen whole sites pillaged not for ingestion into the hard-to-track ether of a LLM training set, but instead to be reproduced like-for-like by sites and channels harvesting ad revenue from stolen work.
It has become routine for content on any platform you can think of - sites, apps, forums, social feeds, you name it - to be re-released elsewhere under someone else's banner using AI tools to repackage it almost instantaneously and - to add insult to burglary - perhaps reshape the message so it echoes an entirely different viewpoint and intent.
Press Gazette helped reveal one of many such examples this week, with a distressing tale that reminds you of the personal impact of such industrialised theft, where a highly personal article was taken in whole from behind a paywall and reproduced on YouTube using an AI voice and using images lifted from the original piece. It's a terrible tale in itself and it's worth reading the full outrage of it in Press Gazette.
It reminds us that the big tech firms most at the centre of stealing content for AIs have a long track record of doing the bare minimum to respect copyright norms. So why should we be surprised at how they themselves act?
To quote from the affected writer, freelance journalist Rob McGibbon: "Google [as owners of YouTube] are handling stolen goods in plain sight and governments must find the way to hold them responsible. In the real world selling on stolen stuff is as serious as the criminal who stole it in the first place. Why should it be different in the virtual world?"
YouTube's smoking guns
YouTube already proliferates with trash channels - as do other platforms too - many of which use AI to simply rip off the work of others. One particular area of interest of mine, advanced military technology, is wide open to such awfulness because it features things that go bang, something always of interest to a wide audience.
For those of us seeking understanding in how and why modern military technology is applied, there are some excellent content producers who inform the interested layman and expert journalist alike. Yet secreted among them, like a gang of IQ-siphoning content bandits, are numerous AI channels that repurpose both stock and stolen footage, and add moronic or sensationalised AI voice-overs to it, and crank the advert frequency up to 11.
YouTube's algo has very little insight into quality, and so it eagerly serves up such intellectually damaging content without pause for thought because it matches the most basic "things that go bang" requirement.
It's the same story across virtually any area of interest. If you have any sort of viewing pattern, you'll have seen huge numbers of them already - the only hard part is working out who actually made them first before the copycats slunk into the suggestion list to gobble up the ad revenue and attention the original channel would have had.
Mercifully, such channels are fairly easy to spot by their names. By that, my fellow humans, I mean there's something "off" about them, a strange misleading ambiguity that's easier to spot than describe. That's the case in the egregious tale above, which appeared on a channel called "The World News". Such a channel name could be an indication of staggering publishing ambition, however, for the YouTube-initiated, it's more likely an indication of a different kind of ambition.
There is usually an attempt by such channels to at least partly conceal their dark deeds, for example using several sources for the content, but the example above didn't even bother to do that: it just lifted the original from behind the Daily Mail's paywall and ran with it.
Much as publishers are manning the copyright bulwarks against Big AI and its giant gaping data maw, we face more immediate threats from innumerable such small actors who require only an idea of what works on YouTube and where to source it - paywalls be damned.
Entering and breaking
The ability to get around paywalls is of course worthy of discussion.
It is possible some AI-enabled tool was used to do so, as intellectual property and tech lawyer Annalisa Checchi told the Press Gazette. "AI systems don’t necessarily 'hack' paywalls in a traditional sense, but instead exploit common weaknesses," she said. "These include creating multiple fake accounts to use limited free access, using browser automation tools to mimic human behaviour, or abusing APIs designed for legitimate user access. In some cases, AI-powered scrapers can chain together snippets from various sources – including cached pages, RSS feeds and previews – to recreate full articles."
You haven't needed AI to do this sort of stuff before, but the tech now makes it much easier to do at enormous scale, performing the kind of crappy, time-consuming job the pirates of old would have had to do themselves.
There's a general theme in that while the bandstand AI performance is all about fancy stuff, the real difference it is having is in the boring, detailed tasks with very specific parameters. Ironically, much as it does for professional publishers in workflows and tasks of tedium. It's also possible the process is still manual, and only the content reproduction part of it uses AI. Whatever the case, it's still a huge breach that no doubt others are pouring through right now.
Given the sheer volume of content continuously published to YouTube, it's no surprise that their response to copyright claims employs what we might call the "platform gambit". By which, you say you're a "just a platform" and neutral in it all until a specific claim is lodged with you via copyright takedown notice.
To quote their response to this particular incident: "It's not up to YouTube to decide who 'owns the rights' to content, which is why as an intermediary, YouTube gives copyright holders tools to make copyright claims and uploaders tools to dispute claims that are made incorrectly."
This is a contrast to the situation only a few years back, when YouTube copyright strikes - the removal of content after a complaint - were rife, with almost an a priori assumption that a claim was valid.
And, on the other side of the coin, these strikes were also frequently unjust or in fact aided the thieves who used them to steal revenue from the legitimate creators.
It is all a complete mess, and you can understand the views which say YouTube is damned if they do, and damned if they don't.
While the video platform is a beneficiary - and probably quite a willing one - I am realistic in knowing the problem is not as easy to solve, or one that legislators will feel they can get a grip on, in comparison to Google's burglary of content for their own AI which is pretty cut and dried.
The onus then is on publishers to protect themselves and their content. There is no help but ourselves. We are likely in an eternal game of cat and mouse, gamekeeper versus poacher, with such attempts to make a dishonest profit from our work.
It's not easy though, under attack as we are from both the AI behemoths above us, and the small content farming outfits below us, based who knows where or under what legal enforcement.
With the very notion of copyright on the chopping block, these can feel like confusing times. To my mind, if some argue that the current notion of copyright is dead and it's open season for LLM training on almost everything online, then by that measure "The World News" have done nothing wrong either.
We either defend all our content or we defend none.
I'm with the "all".
How does Glide Publishing Platform work for you?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demo