Show more

So we recently received a pull request on Firefox Relay that was clearly not written by a human. How could we tell? Well, there were some red flags...

Today is a day for posing puzzles to ChatGPT, I guess, but it’s not my first time doing so. I’d previously asked the classic river-crossing puzzle, changing the names of the traditional animals and vegetable to avoid triggering known sequences of words. Previously I had been using the March 14 public release of ChatGPT, but just now I regenerated with the March 23 public release, which might or might not be using the same underlying model.

Previously:

Q. Suppose I have a blurple, a mantiple and a hoje, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the blurple and hoje alone together, and I am not allowed to leave the hoje and mantiple alone together. How can I safely get all three across?

ChatGPT: This is a classic river crossing puzzle. Here’s one possible solution:

ChatGPT: 1. Take the blurple across the river, leaving the mantiple and hoje behind.

ChatGPT: 2. Leave the blurple on the other side of the river and return to the original side.

ChatGPT: 3. Take the hoje across the river and leave it there with the blurple.

ChatGPT: 4. Take the mantiple across the river and leave it with the hoje.

ChatGPT: 5. Return to the original side of the river and retrieve the blurple.

ChatGPT: 6. Finally, take the blurple across the river to join the mantiple and hoje on the other side.

ChatGPT: This solution ensures that no two items that cannot be left alone together are ever left alone together.

While the answer claims to recognize and understand the problem, it gives a bad answer. At step 1, the hoje eats the mantiple, and at step 3, the hoje eats the blurple. Then at step 4, the hoje eats the mantiple a second time.

But let’s try again! Today’s answer is:

ChatGPT: This is a classic puzzle that requires careful planning to solve. Here’s one possible solution:

ChatGPT: 1. Take the blurple and the hoje across the river. Leave the blurple on the other side and come back alone.

ChatGPT: 2. Take the mantiple across the river. Leave it on the other side with the blurple, and bring the hoje back with you.

ChatGPT: 3. Leave the hoje on the original side and take the blurple back across the river.

ChatGPT: 4. Finally, take the hoje across the river to reunite it with the mantiple and the blurple.

ChatGPT: By following these steps, you ensure that the blurple and hoje are never left alone together, and neither are the hoje and mantiple. All three items are safely transported across the river.

Loud, exasperated sigh!

Compare and contrast the question, which includes “I have a boat that can only carry myself and a single other item,” with this answer, which starts “Take the blurple and the hoje across the river.” That breaks the rules in a new way, and goes on to still let the hoje eat the blurple in step 1. Meanwhile, the blurple is on both sides of the river at once after step 2.

Syntax and probablistic word association is far more powerful than I expected it to be, but these logic puzzles certainly don’t give any indication of that, at least not with this model.

Wow, it gets worse! I asked a third question (adding one to each of the colors, plus three to the total in the source puzzle):

Q. A man has 56 socks in his drawer: 22 identical blue, 16 identical black and 18 identical red. The lights are out and he is completely in the dark. How many socks must he take out to make 100 percent certain he has at least one pair of black socks?

ChatGPT: To make 100% certain that he has at least one pair of black socks, the man needs to take out only 3 socks.

ChatGPT: Here’s why:

ChatGPT: The worst-case scenario for the man is that he picks out all of the blue socks and all of the red socks before he gets to the black socks. In that case, he would have picked out 22 + 18 = 40 socks without getting a pair of black socks.

ChatGPT: However, as soon as he picks out the 41st sock, it will be a black sock. So, he only needs to take out 3 socks (the first two to get a different color than black, and the third to get a black sock) to make sure that he has at least one pair of black socks.

What? That’s… the correct answer is 42, not 3!

Is it going to get all of these wrong? That is not what I expected!

Q. The day before two days after the day before tomorrow is Saturday. What day is it today?

ChatGPT: If the day before two days after the day before tomorrow is Saturday, then today is Thursday.

ChatGPT: Here’s why:

ChatGPT: “Two days after the day before tomorrow” is the same as “two days after today”. So, if the day before two days after the day before tomorrow is Saturday, then we can write it as:

ChatGPT: The day before (two days after today) is Saturday.

ChatGPT: Let’s work backwards from Saturday:

ChatGPT: If two days after today is Saturday, then today is Thursday.
ChatGPT: So, the day before (two days after today) is Wednesday.
ChatGPT: And, the day before Wednesday is Tuesday.
ChatGPT: Therefore, if the day before two days after the day before tomorrow is Saturday, then today is Tuesday.

I’d like the spend some time marveling at an answer that starts out wrong, and then gets wronger.

The correct answer is that today is Friday. ChatGPT says that today is Thursday, then in the course of explaining why, changes the answer to Tuesday, without even acknowledging the change. Wow!

Okay, I’m going to try one more. it’s a classic, so surely it will get this one right, right? RIGHT?

Q. This “burning rope” problem is a classic logic puzzle. You have two ropes that each take an hour to burn, but burn at inconsistent rates. How can you measure 45 minutes? (You can light one or both ropes at one or both ends at the same time.)

ChatGPT: Here’s how you can measure 45 minutes using the two ropes:

ChatGPT: 1. Light one end of the first rope and both ends of the second rope at the same time.

ChatGPT: 2. The first rope will burn completely in 30 minutes (since it takes an hour to burn and you lit one end).

ChatGPT: 3. The second rope will burn completely in 15 minutes (since it takes an hour to burn and you lit both ends).

ChatGPT: 4. At the moment the first rope burns out (30 minutes after you started), light the other end of the second rope.

ChatGPT: 5. The remaining half of the second rope will burn in 15 minutes (since it now only has half the length to burn).

ChatGPT: Therefore, the second rope will have burned for a total of 30 + 15 = 45 minutes when the remaining half burns out.

I genuinely thought it would do better than this by parroting syntax alone. I didn’t even change any words in the last two questions, so it’s possible or even probable that the source document was in its training data. Yet still spicy autocomplete gave answers that were literally nonsense.

Consider ChatGPT’s second bullet point: “The first rope will burn completely in 30 minutes (since it takes an hour to burn and you lit one end).” What? No. Just, no.

It’s clear that it was fed the classic puzzle in training data, as it uses words that suggest exposure to the question, but since it doesn’t understand the words its using, it’s literally putting out nonsense.

ChatGPT (March 23) is 0 for 5, or maybe 0.5 for 5 on the 50/50 question.

Now people are free to chime in and tell me that the March 23 model is just v3.5turbo, while v4 (currently available only to paying customers) gets all of these correct. Which I could easily believe! I expected syntactical clues and common training data to result in better than 0 for 5, certainly!

Show thread

I changed the three names in the second question, in case the original with Jack, Anne, and George had been in the training data.

Show thread

More fun testing with the March 23 version of OpenAI’s ChatGPT. I tried a few logic puzzles I consider simple. In case the exact page I pulled these from were used in training, I changed “duck” to “fridge” in this one:

Q. There are two fridges in front of a fridge, two fridges behind a fridge and a fridge in the middle. How many fridges are there?

ChatGPT: If we assume that “a fridge” refers to the fridge in the middle, then there are a total of 5 fridges.

ChatGPT: There are two fridges in front of the middle fridge and two fridges behind the middle fridge, making a total of four fridges. Add in the fridge in the middle, and the total is 5 fridges.

This is, of course, incorrect. The fridge referred to as “a fridge” changes with each reference, which is why it’s “a fridge” each time, rather than “the fridge.” There are three.

Let’s try a different one:

Q. Peter is looking at Susan. Susan is looking at Freddy. Peter is married, Freddy is not, and we don’t know if Susan is married. Is a married person looking at an unmarried person?

ChatGPT: Yes, a married person is looking at an unmarried person.

ChatGPT: Peter is married and looking at Susan who we don’t know if she’s married or not. Susan is looking at Freddy, who is not married. Therefore, Peter (a married person) is looking at Freddy (an unmarried person).

I’m not sure whether to give half credit here, or not. It had a 50% chance of giving the correct overall answer of “yes,” but the of the answer is nonsense. The correct answer is yes because if Susan is unmarried, Peter is looking at Susan, but if Susan is married, Susan is looking at Freddy. But the answer from ChatGPT appears to apply a commutative principle and claims that Peter is looking at Freddy, which is nonsense.

For the record, these were back-to-back questions in a fresh session, no cherry-picking. Try it yourself at chat.openai.com

The number of users on Mastodon isn't really that important.

You're here... I'm here... We'll make the best of it together, friend. 🥰

I’ve been posting comments frequently on Hacker News recently, mostly on threads related to ChatGPT.

I find myself incredibly impressed with ChatGPT, even the 3.5turbo version (aka Mar 14) currently available at chat.openai.com. It is an incredibly useful tool that surprises me regularly with amazing responses. I completely understand why so many people anthropomorphize it, and ascribe reasoning skills to it!

It also completely fails on a near-daily basis in my testing, and in odd ways, reminding me regularly that it’s mathematically-driven spicy autocomplete.

If you’re a skeptic, set your skepticism to the side for long enough to try it out, and see if you aren’t impressed.

If you’re a true believer, step back a bit. I think you’re falling into a very human trap, one which ChatGPT probably won’t, ironically.

If you’re fearful, don’t be! These are impressive tools, but they still need to be used by someone, and you can be that someone. If people end up losing their jobs because of ChatGPT, it will be because oligarchs decided to cut costs, not because spicy autocomplete took your job.

It's incredible how much space is given to a few people in motor vehicles vs so many more people on foot or wheels. It's a vast inequity.

Nowhere is this more stark than #ShibuyaCrossing; the world's busiest pedestrian crossing with as many as 3,000 people crossing at a time. Compared to around 12 cars fromone direction of the junction in one sequence. #Urbanism #CitiesForPeople

Our #ObjectOfTheWeek is a #NewAcquisition: The 2nd MP3 Player in the World, the Diamond Rio PMP300 (Sept 1998, after S.Korea’s MPMan in March) – the 1st commercially successful MP3 player.

The Rio provoked the 1st RIAA’s lawsuit trying to kill MP3s (they lost bc their suit was based on the misunderstanding that the Rio was a recording device).

Even after 25 years, our Rio still plays the (terrible) files on its built-in 32MB storage.
#diamondrio #pmp300 #mp3player #mp3 #90s #portablesound

Happy #TransDayOfVisibility to all our trans friends! This has been a particularly tough year, but don't let that get you down.

You deserve the world. You deserve to be who you want! Don't let anyone tell you otherwise!

#TransgenderDayOfVisibility #TransRightsAreHumanRights

Hi

Please hire me.

I can write c++ and python aswell as Javascript and html. I can also do graphical design, vector graphics and 3d modeling.

I can also probably learn additional programming languages if needed.

Please hire me so I can avoid the negative consequences of not working as well as doing questionable or highly unethical jobs

:boost_ok:

I’m not being conspiracy-minded on the “more data” part, either. This is from the linked article, which says that switching to the new Google system “will help the automaker capture more data on its customers’ driving, listening, and charging habits. It could also help inform future subscription products, as automakers across the board are seeking to generate more revenue beyond just selling cars.”

NOPE!

Show thread

I bought a 2023 Chevy Bolt EUV, and it’s fantastic. One of my favorite things about it is Carplay, which I use daily. Today I learned that future Chevy EVs will not include Carplay, making this the last Chevy EV I will buy. Just an amazingly stupid move by GM!

Apparently they want to collect more and more data from people who buy their cars, which in an amazing coincidence is something I very much DO NOT WANT.

Fortunately, I now have two EVs, so I won’t be in the market for quite some time. Maybe they’ll have reversed themselves again by then.

theverge.com/2023/3/31/2366481

"The size and age of the Cosmos are beyond ordinary human understanding. Lost somewhere between immensity and eternity is our tiny planetary home."
— Carl Sagan

The Milky Way over La Silla Observatory. Credit: ESO/S. Brunier

Don’t like. Don’t comment. Just re-post until the gun nuts see this image in their sleep tonight.

I wrote this because everyone needs to be clear that what's happening at @TexasObserver isn't just another case of nonprofit media struggling to sustain itself -- a 70-year-old magazine is being deliberately tanked and staff are trying to save it.

andreagrimes.substack.com/p/we

Today is Tuesday, 28th of March 2023, and that's just not something you can say every day.

If you're not woke you're sound asleep and dreaming of a time that never existed.

1/23 Why explore the #PartySwitch?

#WhiteSupremacists like to deny it happened. Some just don't know any better but, from those who do, this is a flag - are they hiding something?

All our threads in this discussion thus far (which I've saved on Medium pending a better way to do this, see medium.com/@DuaneAubin) have provided broader context towards enhancing clarity for so much of #AmericanHistory and the current state of affairs, and this topic is no different in that regard.

Let's rock!

One of my brothers pointed this out, but the proper experience for listening to Del Amitri’s “Roll To Me” is not, as I would have said, as background music in a grocery store. No, it turns out that the proper experience involves headphones. That song is mixed with a heavy emphasis on stereo. There are no drums at all in the right channel! Only vocals, acoustic guitar, and I think castanets.

I’m not even going to link to a YouTube video here, because you should listen to it in the music service of your choice using headphones. Not in a car or a grocery store!

Show more
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.