**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:43

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:43

Carl T. Bergstrom @ct_bergstrom@fediscience.org

Mar 17, 2023, 21:43

Carl T. Bergstrom @ct_bergstrom@fediscience.org

In our book Calling Bullshit, Jevin West and I provide the following definition.

Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence.

Notice that this is what large language models are designed to do. They are designed with a blatant disregard for logical coherence in that they lack any underling logic model whatsoever.

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:47

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:47

Mar 17, 2023, 21:47

Carl T. Bergstrom @ct_bergstrom@fediscience.org

Any semblance of logical coherence is essentially a statistical consequence of patterns in the massive amounts of text on which they have been trained.

We can see this in their ability to solve logic puzzles.

My first introduction to logic puzzles was as a young child, through Raymond Smullyan's wonderful _What Is the Name of This Book?_

The first puzzle in that book is a very simple riddle. Let's see how various LLMs do with it.

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:55

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:55

Mar 17, 2023, 21:55

Carl T. Bergstrom @ct_bergstrom@fediscience.org

Here's the riddle.

A man was looking at a portrait. Someone asked "Whose picture are you looking at? He replied "Brothers and sisters, I have none. but this man's father is my father's son. " Whose picture was the man looking at?

—

The answer: the man's son.

(As a six-year-old hearing this for the first time, my immediate reaction was "Who the hell talks like that? What a jerk." I didn't know to express it this way at the time, but hadn't this guy ever heard of Grice's maxims?)

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:56

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 21:56

Mar 17, 2023, 21:56

Carl T. Bergstrom @ct_bergstrom@fediscience.org

First up, ChatGPT, free version (GPT 3.5).

It completely flops.

183b6831c4830019.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:05

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:05

Mar 17, 2023, 22:05

Carl T. Bergstrom @ct_bergstrom@fediscience.org

Amazon's Lex, which they have announced is built on GPT-4, flops as well.

Because Lex does prose continuation rather than chat, to prompt an answer I typed in the riddle, wrote "The answer is...", and asked it to continue my text by typing +++.

Here's what it said. It also gets the answer wrong, and follows with bullshit of a different flavor. A bunch of trite nonsense about riddles and thinking.

08632f07bb8b79bd.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:18

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:18

Mar 17, 2023, 22:18

Carl T. Bergstrom @ct_bergstrom@fediscience.org

Now let's try Bing's Prometheus,
built atop GPT-4.

Running in "precise" mode, it gets it right and gives a good explanation.

a1db7267bdb40033.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:23

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:23

Mar 17, 2023, 22:23

Carl T. Bergstrom @ct_bergstrom@fediscience.org

In "balanced model", Bing also gets the right answer but now when asked to give an explanation simply repeats itself.

7f1f089c6af27fdf.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:30

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:30

Mar 17, 2023, 22:30

Carl T. Bergstrom @ct_bergstrom@fediscience.org

In "creative mode", things start off well. Bing gets the riddle right again....

fedcb8ba7d3c204d.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:32

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:32

Mar 17, 2023, 22:32

Carl T. Bergstrom @ct_bergstrom@fediscience.org

But when asked to explain, things go off the rails again, quickly. Bing gets confused, and goes so far as to change its answer in an attempt to talk its way out of a contradiction it thinks it has discovered.

a6ee9d07208e2678.png

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:38

**Carl T. Bergstrom** @ct_bergstrom@fediscience.org · Mar 17, 2023, 22:38

Mar 17, 2023, 22:38

Carl T. Bergstrom @ct_bergstrom@fediscience.org

@twitskeptic shelled out big bucks for the OpenAI version of GPT-4, and has tried the riddle there as well.

For what it's worth, GPT-4 gets it right as well and provides a decent explanation: https://fediscience.org/@twitskeptic@qoto.org/110040906085559199

Though read on for further evidence that even GPT-4 is not constrained to require logical coherence: https://fediscience.org/@twitskeptic@qoto.org/110040917959151967