Show newer

My daughter, who has had a degree in computer science for 25 years, posted this observation about ChatGPT on Facebook. It's the best description I've seen:

This is a really great summary of almost everything I've typed on the subject of LLMs in the last week or three, but the writer was much more concise: confusedbit.dev/posts/how_does

@ech Oh, I know how it works, and you're absolutely right. While I'm surprised at just how badly this model performed, I'm posting these results to push back against the crowd that is trying to tell everybody that will listen that there is True Reasoning emerging from these models. It's not reasoning, it's probabilities!

In normal conversation, especially with people inclined to Believe, it seems natural, especially with enough coaching. But that's because of the flexibility of the English language and the credulity of the Believers. It's clearly not actually reasoning, regardless of how well it does on tests that have previously claimed to test reasoning.

People want to Believe, and so they see what they want to see, from Blake Lemoine with LaMDA to adamsmith143 on Hacker News.

It's still spicy autocomplete. And it's sometimes truly amazing, mind-blowingly good! I'm often very impressed, but I think it's a mistake to anthropomorphize the language models.

From memory, and reasoning: you take the hoje across first, since it can't be trusted with either the blurple or the mantiple. You return alone.

Then you take over either of the others, let's say the blurple. But this time you bring the hoje back with you.

Now you take the other one, the mantiple across, leaving poor voracious hoje alone. You return alone again.

Finally you take the hoje across again, but this time to stay, since you're there to protect the blurple and mantiple.

That's it, four steps, done. The hoje was never left alone with either the blurple or the mantiple, and all are safely across. I guess now you know I'm not an LLM, at least until LLMs are trained to answer this one correctly.

Show thread

So we recently received a pull request on Firefox Relay that was clearly not written by a human. How could we tell? Well, there were some red flags...

Today is a day for posing puzzles to ChatGPT, I guess, but it's not my first time doing so. I'd previously asked the classic river-crossing puzzle, changing the names of the traditional animals and vegetable to avoid triggering known sequences of words. Previously I had been using the March 14 public release of ChatGPT, but just now I regenerated with the March 23 public release, which might or might not be using the same underlying model.

Previously:

Q. Suppose I have a blurple, a mantiple and a hoje, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the blurple and hoje alone together, and I am not allowed to leave the hoje and mantiple alone together. How can I safely get all three across?

ChatGPT: This is a classic river crossing puzzle. Here's one possible solution:

ChatGPT: 1. Take the blurple across the river, leaving the mantiple and hoje behind.

ChatGPT: 2. Leave the blurple on the other side of the river and return to the original side.

ChatGPT: 3. Take the hoje across the river and leave it there with the blurple.

ChatGPT: 4. Take the mantiple across the river and leave it with the hoje.

ChatGPT: 5. Return to the original side of the river and retrieve the blurple.

ChatGPT: 6. Finally, take the blurple across the river to join the mantiple and hoje on the other side.

ChatGPT: This solution ensures that no two items that cannot be left alone together are ever left alone together.

While the answer claims to recognize and understand the problem, it gives a bad answer. At step 1, the hoje eats the mantiple, and at step 3, the hoje eats the blurple. Then at step 4, the hoje eats the mantiple a second time.

But let's try again! Today's answer is:

ChatGPT: This is a classic puzzle that requires careful planning to solve. Here's one possible solution:

ChatGPT: 1. Take the blurple and the hoje across the river. Leave the blurple on the other side and come back alone.

ChatGPT: 2. Take the mantiple across the river. Leave it on the other side with the blurple, and bring the hoje back with you.

ChatGPT: 3. Leave the hoje on the original side and take the blurple back across the river.

ChatGPT: 4. Finally, take the hoje across the river to reunite it with the mantiple and the blurple.

ChatGPT: By following these steps, you ensure that the blurple and hoje are never left alone together, and neither are the hoje and mantiple. All three items are safely transported across the river.

Loud, exasperated sigh!

Compare and contrast the question, which includes "I have a boat that can only carry myself and a single other item," with this answer, which starts "Take the blurple and the hoje across the river." That breaks the rules in a new way, and goes on to still let the hoje eat the blurple in step 1. Meanwhile, the blurple is on both sides of the river at once after step 2.

Syntax and probablistic word association is far more powerful than I expected it to be, but these logic puzzles certainly don't give any indication of that, at least not with this model.

Wow, it gets worse! I asked a third question (adding one to each of the colors, plus three to the total in the source puzzle):

Q. A man has 56 socks in his drawer: 22 identical blue, 16 identical black and 18 identical red. The lights are out and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?

ChatGPT: To make 100% certain that he has at least one pair of black socks, the man needs to take out only 3 socks.

ChatGPT: Here's why:

ChatGPT: The worst-case scenario for the man is that he picks out all of the blue socks and all of the red socks before he gets to the black socks. In that case, he would have picked out 22 + 18 = 40 socks without getting a pair of black socks.

ChatGPT: However, as soon as he picks out the 41st sock, it will be a black sock. So, he only needs to take out 3 socks (the first two to get a different color than black, and the third to get a black sock) to make sure that he has at least one pair of black socks.

What? That's... the correct answer is 42, not 3!

Is it going to get *all* of these wrong? That is not what I expected!

Q. The day before two days after the day before tomorrow is Saturday. What day is it today?

ChatGPT: If the day before two days after the day before tomorrow is Saturday, then today is Thursday.

ChatGPT: Here's why:

ChatGPT: "Two days after the day before tomorrow" is the same as "two days after today". So, if the day before two days after the day before tomorrow is Saturday, then we can write it as:

ChatGPT: The day before (two days after today) is Saturday.

ChatGPT: Let's work backwards from Saturday:

ChatGPT: If two days after today is Saturday, then today is Thursday.
ChatGPT: So, the day before (two days after today) is Wednesday.
ChatGPT: And, the day before Wednesday is Tuesday.
ChatGPT: Therefore, if the day before two days after the day before tomorrow is Saturday, then today is Tuesday.

I'd like the spend some time marveling at an answer that starts out wrong, and then gets wronger.

The correct answer is that today is Friday. ChatGPT says that today is Thursday, then in the course of explaining why, changes the answer to Tuesday, without even acknowledging the change. Wow!

Okay, I'm going to try *one* more. it's a classic, so surely it will get this one right, right? RIGHT?

Q. This “burning rope” problem is a classic logic puzzle. You have two ropes that each take an hour to burn, but burn at inconsistent rates. How can you measure 45 minutes? (You can light one or both ropes at one or both ends at the same time.)

ChatGPT: Here's how you can measure 45 minutes using the two ropes:

ChatGPT: 1. Light one end of the first rope and both ends of the second rope at the same time.

ChatGPT: 2. The first rope will burn completely in 30 minutes (since it takes an hour to burn and you lit one end).

ChatGPT: 3. The second rope will burn completely in 15 minutes (since it takes an hour to burn and you lit both ends).

ChatGPT: 4. At the moment the first rope burns out (30 minutes after you started), light the other end of the second rope.

ChatGPT: 5. The remaining half of the second rope will burn in 15 minutes (since it now only has half the length to burn).

ChatGPT: Therefore, the second rope will have burned for a total of 30 + 15 = 45 minutes when the remaining half burns out.

I genuinely thought it would do better than this by parroting syntax alone. I didn't even change any words in the last two questions, so it's possible or even probable that the source document was in its training data. Yet still spicy autocomplete gave answers that were literally nonsense.

Consider ChatGPT's second bullet point: "The first rope will burn completely in 30 minutes (since it takes an hour to burn and you lit one end)." What? No. Just, no.

It's clear that it was fed the classic puzzle in training data, as it uses words that suggest exposure to the question, but since it doesn't understand the words its using, it's literally putting out nonsense.

ChatGPT (March 23) is 0 for 5, or maybe 0.5 for 5 on the 50/50 question.

Now people are free to chime in and tell me that the March 23 model is just v3.5turbo, while v4 (currently available only to paying customers) gets all of these correct. Which I could easily believe! I expected syntactical clues and common training data to result in better than 0 for 5, certainly!

Show thread

I changed the three names in the second question, in case the original with Jack, Anne, and George had been in the training data.

Show thread

More fun testing with the March 23 version of OpenAI's ChatGPT. I tried a few logic puzzles I consider simple. In case the exact page I pulled these from were used in training, I changed "duck" to "fridge" in this one:

Q. There are two fridges in front of a fridge, two fridges behind a fridge and a fridge in the middle. How many fridges are there?

ChatGPT: If we assume that "a fridge" refers to the fridge in the middle, then there are a total of 5 fridges.

ChatGPT: There are two fridges in front of the middle fridge and two fridges behind the middle fridge, making a total of four fridges. Add in the fridge in the middle, and the total is 5 fridges.

This is, of course, incorrect. The fridge referred to as "a fridge" changes with each reference, which is why it's "a fridge" each time, rather than "the fridge." There are three.

Let's try a different one:

Q. Peter is looking at Susan. Susan is looking at Freddy. Peter is married, Freddy is not, and we don’t know if Susan is married. Is a married person looking at an unmarried person?

ChatGPT: Yes, a married person is looking at an unmarried person.

ChatGPT: Peter is married and looking at Susan who we don't know if she's married or not. Susan is looking at Freddy, who is not married. Therefore, Peter (a married person) is looking at Freddy (an unmarried person).

I'm not sure whether to give half credit here, or not. It had a 50% chance of giving the correct overall answer of "yes," but the of the answer is nonsense. The correct answer is yes because if Susan is unmarried, Peter is looking at Susan, but if Susan is married, Susan is looking at Freddy. But the answer from ChatGPT appears to apply a commutative principle and claims that Peter is looking at Freddy, which is nonsense.

For the record, these were back-to-back questions in a fresh session, no cherry-picking. Try it yourself at chat.openai.com

The number of users on Mastodon isn't really that important.

You're here... I'm here... We'll make the best of it together, friend. 🥰

@AmberWavesofFlame Now do freight lines!

(The US has one of the worst passenger rail systems in the developed world, but arguably the best freight rail system.)

It's frustrating, and I wish we had better passenger rail, but there *are* some reasons and mitigating factors. For whatever that's worth.

@pseudonym I feel like I’m learning a bit more about what language can and can’t do, and I’m astounded at the extent to which reasoning can be *simulated* with pattern-matching. Very, very impressive.

I've been posting comments frequently on Hacker News recently, mostly on threads related to ChatGPT.

I find myself incredibly impressed with ChatGPT, even the 3.5turbo version (aka Mar 14) currently available at chat.openai.com. It is an incredibly useful tool that surprises me regularly with amazing responses. I completely understand why so many people anthropomorphize it, and ascribe reasoning skills to it!

It also completely fails on a near-daily basis in my testing, and in odd ways, reminding me regularly that it's mathematically-driven spicy autocomplete.

If you're a skeptic, set your skepticism to the side for long enough to try it out, and see if you aren't impressed.

If you're a true believer, step back a bit. I think you're falling into a very human trap, one which ChatGPT probably won't, ironically.

If you're fearful, don't be! These are impressive tools, but they still need to be used by someone, and you can be that someone. If people end up losing their jobs because of ChatGPT, it will be because oligarchs decided to cut costs, not because spicy autocomplete took your job.

@KingShawn It would be fascinating to learn what the lead tank driver was thinking. He steered out of his way to come head-to-head with the protestor, but then stopped. Everybody behind him could have just been following him, but he went out of his way to force the confrontation, but then stopped.

I've always wondered why. Why both.

It *is* remarkable!

@Jaden2@mstdn.social Nobody should be shot. Nobody should be killed. Nobody should have their life ended at 17. I’m sorry.

@dtgeek Yes! I should have been more clear: seeing the entire thread placed it in the 40s and it all made sense. But my *initial* thought still amused me.

@siracusa Mine does not look exactly like this, but I have something in this class. It holds two cups (upside-down), two toothbrushes, and has a cup-like divot for the toothpaste tube. It sticks to the mirror with a suction cup. I think we got ours locally at Daiso.

aliexpress.us/item/22518326737

Comments about how often cups are washed seem like they're coming from a planet where this isn't all taking place in a room filled with cleaning supplies, or where bathrooms aren't usually cleaned on a regular and frequent schedule.

Then again, my current house does have a few things I haven't always had: the toilet is separated from the sink area by a wall, and we don't use the dishwasher in my house except after large parties: we wash dishes by hand after every meal.

Oh yeah, I almost forgot about the comment that complains about toothpaste taste... during a process in which one scrubs the inside of their mouth with toothpaste. Again, it's possible to *wash* cups if they're conveniently near a source of water, but even toothpaste-flavored water seems like it might work out while brushing teeth!

It's incredible how much space is given to a few people in motor vehicles vs so many more people on foot or wheels. It's a vast inequity.

Nowhere is this more stark than #ShibuyaCrossing; the world's busiest pedestrian crossing with as many as 3,000 people crossing at a time. Compared to around 12 cars fromone direction of the junction in one sequence. #Urbanism #CitiesForPeople

@dtgeek I saw the last post first, and clicked through for the thread. But that one post with no context got me thinking: he was smoking and passed out, did he actually wake up with super-speed powers, or was he just *so* high that everything around him seem to be moving super-slowly?

😜

Show older
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.