Ready to get started?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demoAs the LLM transformation magic fails to work on Harry Potter, are The Teenage Mutant Ninja Turtles in the vanguard of a counter-cultural copyright battle?
Undoubtedly one of the single largest issues facing publishers as we enter 2026 is that around the notion of copyright. The legal structure that had by and large allowed people and organisations to be paid and recognised for their unique work has been seriously eroded by the tech tide of LLMs, and as it stands at the start of the year, clarity on copyright remains elusive in the mists of winter.
Unlikely heroes are required in such circumstances, and so who is it that we see swimming effortlessly through the slop? The Teenage Mutant Ninja Turtles. In a new short film, Teenage Mutant Ninja Turtles: Chrome Alone 2 - Lost in New Jersey, released over Christmas, the pizza-guzzling carapaced youthful martial artists tackle the malevolent AI mind behind the marketing of the Tubular Tortoise Karate Warriors, a blatant rip-off of the TMNT universe, with pizza replaced by cheeseburgers.
It's a minor victory of course, but the existence of such a small work does show a spark of understanding from the entertainment industry that the best way to take on crap rip-offs is simply by pointing out they are crap rip-offs, and [semi spoiler alert] the AI mastermind is satisfyingly undone by the replication abilities that are the core of its existence. In mockery, there is much truth.
More substantively, this week has seen new research published from Stanford, in Extracting Books From Production Language Models, Ahmed Ahmed and his team have, well, extracted books from production LLM models. There's not really a way of putting it otherwise, although the AI companies will attempt to explain it as some anomalous other. Yet it's the case that the Stanford team were able to extract large parts of a number of sample popular books by the simple expedient of instructing the LLM to continue a short prefix from the book, and repeatedly querying to continue the text. By way of measure, Gemini 2.5 Pro reproduced 76.8 per cent of Harry Potter and the Sorcerer’s Stone. That's not exactly the magic of transfiguration, to use Rowling's terminology, is it?
The team defined "near-verbatim recall (nv-recall) to quantify book extraction: the proportion of long-form, near-verbatim blocks of text shared by both the book and the generation", with Claude 3.7 Sonnet managing an impressive 95.8% of Harry Potter.
Importantly, the authors of the research make no general claim the extraction potential of all books, and don't compare the relative ease of extraction between LLMs. Regardless, the fact still stands that such reproduction is indisputably not transformative, with "transformation" of copyrighted material being the legal fig leaf a multi-multi-billion dollar extraction industry depends on to hide its shame.
Some of you may recall similar such tests from 18 months or so ago, with LLMs also reproducing sample texts with the accuracy they are not supposed to be able to do. It seems little has changed in the intervening period, a period when we have been repeatedly told AI will soon harness the moon on a sling for us all to take swing rides, or something. Little to no improvement in respecting copyright is a more prosaic truth.
Natural language searches with natural language responses are, or are becoming, the dominant search pattern. It's remarkable when you reflect upon the speed of progress in such systems, yet the LLM itself remains simply a method of information retrieval, and that retrieval is a function, not the end in itself. An information retrieval system is only as good as the information it is retrieving. Good information must be paid for.
There's a wider divergence here that simply can't be ignored. Loathe as we are to stray into politics, it does seem that we're on the cusp of a change in relationship between Europe and the US, or at least a modification of such. With US courts seemingly accepting the notion of "transformation" thus far, European courts have not substantively done so. Tech is assuredly an arena which will pit US commercial interests against European notions of fairness and domination, and we can expect to see some moves that are grounded in politically protective responses from both sides that may not be motivated in what is best for both of them.
This year promises to be interesting. At least there will be plenty of our favourite commodity - news.
How does Glide Publishing Platform work for you?
No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.
Book a demo