After hearing Sebastian Bubeck talk about the Sparks of AGI paper today, I decided to give GPT-4 another chance.

If it can really reason, it should be able to solve very simply logic puzzles. So I made one up. Sebastian stressed the importance of asking the question right, so I stressed that this is a logic puzzle and didn't add anything confusing about knights and knaves.

Still, it gets the solution wrong.

Follow

@ct_bergstrom Another "Sparks of AGI" problem is the claim that GPT-4 can reason about emotions in complex situations. The example they give isn't particularly complex, so I came up with another one, well, I don't see a future career in therapy for this model.

@twitskeptic @ct_bergstrom I'll be honest here, I don't think I could answer the question as you formulated it, either.

I mean, I could guess that your implication is that the smile is sarcastic or paradoxical and ask a follow up question about it, but if that's not what you meant, "these two have a rapport and changing the subject is an implicit way to ease the tension and distract from grief" actually tracks with my cultural biases pretty well.

Coming up with an actual Turing test is hard.

@MudMan @twitskeptic @ct_bergstrom
I tried an old one a month ago (ChatGPT, not GPT4).

The solution is designed to be disturbing, but not as disturbing as what it came up with.

@Holten @twitskeptic @ct_bergstrom

Hah. Points for originality and metatextuality on that response.

People keep trying to test these models for intelligence based on things that feel easy, hard or nuanced to humans. I am very disappointed by what should be professional researchers running this sort of stuff without systematizing what these concepts actually are.

I mean, no offense to you. Your thing is a fun experiment, but neither the paper or its debunkers seem to be rigorous enough here.

@MudMan @twitskeptic @ct_bergstrom No, none taken! I'm not in the business of professing "intelligence" for machines, far less defining the meanings of the term in English. This was just for fun.

Anyway, if such things should be valuable at all, fwiw I think the logic puzzles are pretty good, but favor the more rigorous approach by François Chollet and his ARC test set. Which no LLM has yet passed.

@Holten @twitskeptic @ct_bergstrom Even that relies on a problematic, anthropocentric definition that some humans fail. He says it himself, but I also have issues that he doesn't acknowledge.

I do wonder if the debate on general intelligence is even worth it. Ultimately these models are about inputs and outputs. They don't run continuously, so it's clear that sentience is not a thing, regardless of general intelligence. As you suggest, the question may not be useful in the first place. Not yet.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.