It is absolutely astounding to me that we are still earnestly entertaining the possibility that #ChatGPT and #LLMS more broadly have a role in scientific writing, manuscript review, experimental design, etc.
The training data for the question below are massive. It's a very easy question if you're trained on the entire internet.
Question: What teams have never made it to the World Series?
Correct answer: Seattle Mariners.
Now, four responses from GPT4.
NB: The Nationals won it all in 2019.
I had GPT regenerate the answer 20 times. A few things to note:
1. Factual error rate: the system correctlu answered 1 time in 20.
2. Run-to-run inconsistency. I get different answers each time.
3. Logical errors and internally contradictory text in which one paragraph says a team did play and another says it didn't.
4. One attempt to self-correct that still doesn't quite work.
How could we think this sort of thing is useful for writing or even reviewing our work?
@ct_bergstrom This is an error (an understandable one) in the prompt though. You asked for teams plural so it's going to give you a list of teams. Because that's it's understanding of the request.
If you ask which team you get the right answer do you not?
'Prompt engineering' is a skill that needs to be learned. That we think we can ask a casual question and het the right answer out the gate so to speak, is also interesting I think.
@ct_bergstrom @drs1969 Yeah, it's a tool that writes (mediocre?) prose; I think it's best to think of it that way.
So if you want to use it to write something, you have to go back and forth with it until the material is correct.
If it answers questions correctly, you got lucky: it's echoing something from its training data, I guess.
That's my understanding of how it could be useful. People seem concerned that it is going to take over the world and kill us all, so I might be missing something, though.
@ech @ct_bergstrom I probably shouldn't have used the term 'error'. Depending on the words one uses, one can get good answers or bad answers.
It's fascinating - it doesn't throw a 'syntax error' - because this is the first real version of a novel user interface to a computer (speech)
And it's the first real version of a general purpose neural net computer architecture.
These machines can do lots of things very well. & other things badly - but that they can do them at all is amazing.