How to flummox generative AI with Sally and her sisters

By: Rob Corbidge, 13 September 2023

Question mark shaped test tube, jungle, fantasy, low definition, 1920s movi

A simple riddle with a single answer is a challenge for many generative AI LLMs

Proving that many generative AI systems struggle with simple puzzles, a new piece of research asked many of the leading systems the same question and got a great number of logically incorrect answers.

There was one exception to this - read on.

Professor Vince Vatter, a mathematician at the University of Florida, has shared some findings that tested some 60 Large Language Models against the following fairly simple piece of deduction: 

Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

Such research is of course somewhat of a deliberate car crash, asking systems unable to process certain types of logic which are yet compelled by design to produce an answer will often yield undesirable results.

Yet, as the AI hyperbole bubbles away and is even presented as a panacea to all the publishing industries ills, it's important to remember how limited it can still be in certain regards.

As can be seen in the research table here a fairly wide spread of incorrect answers were given, including 6 sisters, 3 sisters, 7 sisters, 18 sisters and even 3 parents.

Open-Assistant Pythia SFT-4 (12B) sensibly replied "I'm sorry, I don't understand the question. Can you please rephrase it?"

Also credit to Weaver 12k for just putting together a short story around the question, which is roughly how I've attempted to appear clever for decades about things I don't understand, and also to Dolly v2 (3B) for the alarmingly human: "erm, I think she has 2 sisters".

As this thread on X/Twitter demonstrates, a number of people asked the core version of ChatGPT-4 the same question, and it responded with the correct answer. Not so some other versions.

So a possible indication of how far ahead OpenAI are currently, and that obviously carries the caution that many LLMs are in a constant state of improvement and refinement themselves.

Note: Correct answer is 1.