Content Aware media news: February 22, 2024

If we all feed the machine, how does the machine know what's right, wrong, or scurrilous?

Published: 10:21, 29 February 2024

So, social content site Reddit is shopping its wares to AI, signing a $60m/year deal with a “mystery AI firm” for its data to be used to train LLMs.

Perhaps a relevant hint to the mystery: OpenAI CEO Sam Altman was a major early investor in Reddit.

But no! Despite the apparent clash of interests… the deal is with Google!

Which, if you know how much Google leans on Reddit for its search results, actually makes sense.

Some users are unsettled, as it's not clear what data or content is actually going to be handed over. After all, some users post extraordinary amounts of very personal information about themselves in what are (perceived to be) member-only parts of the community, including pictures and videos, which would now seemingly be up for grabs.

What responsibility pure AI competitor OpenAI believes it has for such data is outlined on its site, but it’s a bit mealy-mouthed: it says it won’t target you with ads, but, err… it would still have all that stuff about you, ads or not.

We are sure Google has no advertising intentions in mind and organisationally has no way of connecting usernames that match to emails which might be Gmails to get insight into what you post about or follow. (/S).

It says: We use training information only to help our models learn about language and how to understand and respond to it. We do not and will not use any personal information in training information to build profiles about people, to contact them, to advertise to them, to try to sell them anything, or to sell the information itself.

Relevant to publishers: lots of articles are pasted verbatim into Reddit posts rather than linked. So what happens to that data? Lawyers will be working overtime for certain.

Would such a pasted Reddit post be a loophole for an AI to consume any 3rd party data, if it does a deal with Reddit directly? The more popular an article is, or more useful in terms of the information it reveals, the more likely someone will - without malice - paste said content into a thread. It’s being helpful, in fact.

We have already seen that Reddit can outperform Google search results, and in fact - the case of the cake eating itself - posting your own content into Reddit can see your own content outperform your own original article in search results.

And elsewhere, users have already managed to show the danger of AI-generated content based on Reddit threads, by faking thread content to trick an AI-written site into generating fake news. Because, it was funny.

But as we all know not everyone on the internet is funny. Or doing it just for a gag. Basically, who tells the AI what is right or wrong, or truthful or dishonest? If the machine is having everything tipped into its maw, it’s hard to discern good from bad.

If only they had journalists and professionals to help them decide. And this is where the Reddit deal is perhaps more than anything a guide to publishers to the value of their content: professionally created and curated, adhering (by and large) to the laws of the land(s) for things like defamation and slander, and almost by definition deemed worthy of being paid for.

If Reddit is getting $60m/year, what’s your price?

Corbidge comments on... advertising and content, a bargain with reality
The old newsroom and publisher argument between Ads and Eds usually settled around both sides echoing the same claim, "We're the ones that make the money flow...". And they were both right. A move by many publishers away from reliance on adverts has changed the relationship, but not severed it entirely. In fact, it might be making a galloping return says our ad-clicker-in-chief Rob.
Read more

Information freedom
Do you subscribe to SEO wizard Barry Adams’s newsletter SEO for Google News? You should: it’s required reading in the sector, and with a publication rate naturally slowed by his day job working with just about every major news brand you’ve heard of, drop rates for new pieces are understandably less than the harum-scarum rate of things like Content Aware. But it’s worth the wait, as it was when we saw his latest tome drop into our inbox: a look under the hood of how Google search actually works, thanks to the company itself. Check Barry’s Substack.
Read more

Insight into Googlegeddon
The search-results disaster which hammered sites last year left publishers with little to no idea what factors drove sharp rises and falls in traffic. In absence of concrete info from Google, SEO gurus Zyppy pored over data from 50 impacted sites to try and tease out the factors common to those sites which rose in ranking results, and those which slid. Very interesting.
Read more

Google has its own take
In the same vein, a site owner experiencing traffic collapse managed to get Google Search Liaison boss Danny Sullivan himself to have a look and offer some guidance on how to trace reasons for downward results. It's a bit non-committal (quelle surprise) but it's always worth knowing what the word of the search lord actually is.
Read more

Bricking it
One of the top-rated visual site builder tools for WordPress sites is under attack, after a researcher revealed a major vulnerability. Do you use Bricks? Don't let your site get bricked.
Read more

Keep the IPTC
Content metadata authority IPTC champions advice to retain all metadata in images, becoming even more important in the era of AI-generated imagery. Many publishers fail to add or retain IPTC data, often for time or resource reasons. Mandatory product plug: Glide automatically extracts and retains IPTC data from images - users don't need to do anything.
Read more

The right to be uninformed
News orgs have historically been notified by search engines when a right-to-be-forgotten removal of their articles occurred; that's not going to happen any more, after a Swedish court ruled that communicating the news about a person's right to be forgotten was itself a breach of their right to be forgotten.
Read more

Two more cash-for-content laws
First, the world’s 4th-most populous country informs social and tech platforms they will be obliged to pay for news content from publishers, citing it as a national priority for people to be well served by news. Next, legislators in Illinois echo the message for local news, among a package of other measures to aid local journalism. Are the tides turning?
Read more

Computer says what!?
Got a chatbot managing your subscription queries, or toying with the idea of a chatbot to act as a search tool for your site? Make sure it doesn't go rogue and make up its own rules.
Read more

Read this on our Substack.

Latest articles

a robot in front of a library door that says Book Shop, through the windows you can see bookshelves inside

The bot bookshop, breaking news habits are breaking, and the CMA opens the app store gate

a lot of newspapers, one has a "TRUSTED" stamp in red ink on it

Who holds the "trusted" stamp, news middleman no one hired, and US cloud's trust problem

A robot holding an invoice in its metal hands, looking confused. The invoice reads "£500 - CONTENT USAGE."

Bots get bills, the credibility signal AI can't fake, and the world's shortest product launch

Ready to get started?

No matter where you are on your CMS journey, we're here to help. Want more info or to see Glide Publishing Platform in action? We got you.

Book a demo