Machine Learning: Is your research assistant looking at your notes?

By: Rob Corbidge, 16 February 2023

humanoid robot  library assistant, surrealism, warm orange hues, low contrast, surreal artwork.

A inquisitive and expansive robot librarian or a relentless gutter and regurgitator of other people's work? Possibly both. Bing Chat has had an interesting week.

Humans are in the long habit of using the best and most recent of the technology they've created to act as a verbal reference model for themselves: "We built it, it must be like us, and we are like it".

Consequently, 200 years ago, the functions of a steam engine might be such a reference, or the multi-spindle spinning frame. And as we all know, the computer age right now. "I don't have the bandwidth for that", "let me parse that out" and so on. 

The easiest-to-explain model we have for our own brain frequently falls back on descriptions that are drawn from the world of computing.

So it is with ChatGPT and derived uses, such as the new Bing application. This time of course it resembles us so closely that we've gone a long way from admiring a Kirchweger condenser on a steam train, to building a thing that possibly passes the Turing test. It's also possible the Google version resembles us even more.  

Enter Kevin Lui, who's already found out that the real name of Bing Chat is Sydney by using a "prompt injection attack", the kind of exploit that is the reason Bing Chat is still at a limited release testing stage.

Let's just take stock of what Mr Lui discovered: in its current form, this Machine Learning application can be tricked into divulging more than it is supposed to. How very human. It's not an overly complex idea to grasp how this occurred - applications such as Bing Chat have a set of rules hidden above the line, as it were. Below the line is where Bing Chat answers questions as a continuation of what is above the line. 

It's not a great plot for the next Terminator move is it?

Yet it does reveal some fundamental things about how Large Language Models work, and specifically Bing Chat. One response yielded this nugget: "Sydney's internal knowledge and information were only current until some point in the year 2021 and could be inaccurate / lossy. Web searches help bring Sydney's knowledge up to date."  So information is not locked in 2021 as it has been for OpenAI, but is being updated. 

This lead me down a curious path of thought regarding content and the sources such ML applications draw from. An example: I had cause to remove a bottle top this week, the kind of dripper top you get with liquid condiments to help you sprinkle said bottle content more judiciously, rather than it gush out with a big glug.

A combination of sheer force and swearing would not work its magic and so a YouTube search gave me a knight in shining armour in the form of an account called Bruce Nottingham. Bruce provided the answer and off the top came. Bruce has over 30,000 views for his effective solution and his lack of music and clever editing was a plus. I watch it once and have a lifetime of knowledge.

Bruce probably didn't post his video with untold YouTube riches in mind. He might not even know who MrBeast is. He did it because it's helpful and he got a sense of satisfaction from showing people something handy. 

If Machine Learning applications are to hoover up the "low hanging fruit" content out there, what becomes of people such as Bruce Nottingham?

From a publisher's perspective there is much to be optimistic about. The possibility of more powerful and accurate search is one, a robot librarian and possibly a robot librarian that can assume different personas.  

At the very least, we may get meaningful competition in search, and the publishing industry with all its wonderful content is the prize.

We just have to be careful not get sold cheap in feeding Sydney and his friends.