Content Aware media news: February 22, 2024

Published: 23 February 2024

If we all feed the machine, how does the machine know what's right, wrong, or scurrilous?

If we all feed the machine, how does the machine know what's right, wrong, or scurrilous?

So, social content site Reddit is shopping its wares to AI, signing a $60m/year deal with a “mystery AI firm” for its data to be used to train LLMs.

Perhaps a relevant hint to the mystery: OpenAI CEO Sam Altman was a major early investor in Reddit.

But no! Despite the apparent clash of interests… the deal is with Google

Which, if you know how much Google leans on Reddit for its search results, actually makes sense.

Some users are unsettled, as it's not clear what data or content is actually going to be handed over. After all, some users post extraordinary amounts of very personal information about themselves in what are (perceived to be) member-only parts of the community, including pictures and videos, which would now seemingly be up for grabs.

What responsibility pure AI competitor OpenAI believes it has for such data is outlined on its site , but it’s a bit mealy-mouthed: it says it won’t target you with ads, but, err… it would still have all that stuff about you, ads or not.

We are sure Google has no advertising intentions in mind and organisationally has no way of connecting usernames that match to emails which might be Gmails to get insight into what you post about or follow. (/S).

It says: We use training information only to help our models learn about language and how to understand and respond to it. We do not and will not use any personal information in training information to build profiles about people, to contact them, to advertise to them, to try to sell them anything, or to sell the information itself.

Relevant to publishers: lots of articles are pasted verbatim into Reddit posts rather than linked. So what happens to that data? Lawyers will be working overtime for certain.

Would such a pasted Reddit post be a loophole for an AI to consume any 3rd party data, if it does a deal with Reddit directly? The more popular an article is, or more useful in terms of the information it reveals, the more likely someone will - without malice - paste said content into a thread. It’s being helpful, in fact.

We have already seen that Reddit can outperform Google search results , and in fact - the case of the cake eating itself - posting your own content into Reddit can see your own content outperform your own original article in search results.

And elsewhere, users have already managed to show the danger of AI-generated content based on Reddit threads, by faking thread content to trick an AI-written site into generating fake news . Because, it was funny.

But as we all know not everyone on the internet is funny. Or doing it just for a gag. Basically, who tells the AI what is right or wrong, or truthful or dishonest? If the machine is having everything tipped into its maw, it’s hard to discern good from bad.

If only they had journalists and professionals to help them decide. And this is where the Reddit deal is perhaps more than anything a guide to publishers to the value of their content: professionally created and curated, adhering (by and large) to the laws of the land(s) for things like defamation and slander, and almost by definition deemed worthy of being paid for.

If Reddit is getting $60m/year, what’s your price?

100 voices for change
Interested in joining the nascent coalition fighting for media's place in its own hazy future? Join an all-are-welcome video call today/tonight/this morning - depending on your location, it varies from midday San Francisco, 8pm UK, to 7am Australia - where 100 media names hosted by Future Media's Ricky Sutton will ask: who leads the fight for media to stop being eaten up and spat out by Big Tech? Check the link for global times. The first 100 get in. No charge.

Corbidge comments on... advertising and content, a bargain with reality
The old newsroom and publisher argument between Ads and Eds usually settled around both sides echoing the same claim, "We're the ones that make the money flow...". And they were both right. A move by many publishers away from reliance on adverts has changed the relationship, but not severed it entirely. In fact, it might be making a galloping return says our ad-clicker-in-chief Rob.
https://www.gpp.io/comment/advertising-is-king-but-its-content-which-crowns-it-au3JQ1F36rKL

Information freedom
Do you subscribe to SEO wizard Barry Adams’s newsletter SEO for Google News? You should: it’s required reading in the sector, and with a publication rate naturally slowed by his day job working with just about every major news brand you’ve heard of, drop rates for new pieces are understandably less than the harum-scarum rate of things like Content Aware. But it’s worth the wait, as it was when we saw his latest tome drop into our inbox: a look under the hood of how Google search actually works, thanks to the company itself. Check Barry’s Substack.
https://www.seoforgooglenews.com/p/google-is-all-about-links-clicks-keywords

Insight into Googlegeddon
The search-results disaster which hammered sites last year left publishers with little to no idea what factors drove sharp rises and falls in traffic. In absence of concrete info from Google, SEO gurus Zyppy pored over data from 50 impacted sites to try and tease out the factors common to those sites which rose in ranking results, and those which slid. Very interesting.
https://zyppy.com/seo/google-update-case-study/

Google has its own take
In the same vein, a site owner experiencing traffic collapse managed to get Google Search Liaison boss Danny Sullivan himself to have a look and offer some guidance on how to trace reasons for downward results. It's a bit non-committal (quelle surprise) but it's always worth knowing what the word of the search lord actually is.
https://www.searchenginejournal.com/googles-danny-sullivan-provides-5-step-plan-to-diagnose-ranking-drops/508383/

Bricking it
One of the top-rated visual site builder tools for WordPress sites is under attack, after a researcher revealed a major vulnerability. Do you use Bricks? Don't let your site get bricked.
https://snicco.io/vulnerability-disclosure/bricks/unauthenticated-rce-in-bricks-1-9-6

Keep the IPTC
Content metadata authority IPTC champions advice to retain all metadata in images, becoming even more important in the era of AI-generated imagery. Many publishers fail to add or retain IPTC data, often for time or resource reasons. Mandatory product plug: Glide automatically extracts and retains IPTC data from images - users don't need to do anything.
https://iptc.org/news/google-reminds-publishers-and-merchants-not-to-strip-metadata-from-images/

The right to be uninformed
News orgs have historically been notified by search engines when a right-to-be-forgotten removal of their articles occurred; that's not going to happen any more, after a Swedish court ruled that communicating the news about a person's right to be forgotten was itself a breach of their right to be forgotten.
https://www.theguardian.com/technology/2024/feb/15/google-stops-notifying-publishers-of-right-to-be-forgotten-removals-from-search-results

Two more cash-for-content laws
First, the world’s 4th-most populous country informs social and tech platforms they will be obliged to pay for news content from publishers, citing it as a national priority for people to be well served by news. Next, legislators in Illinois echo the message for local news, among a package of other measures to aid local journalism. Are the tides turning?
https://www.reuters.com/business/media-telecom/indonesia-issues-regulations-requiring-digital-platforms-pay-media-content-2024-02-20/

Computer says what!?
Got a chatbot managing your subscription queries, or toying with the idea of a chatbot to act as a search tool for your site? Make sure it doesn't go rogue and make up its own rules.
https://twitter.com/arstechnica/status/1758540835132494119

Read this on our Substack.