Six #ASAPbio fellows asked four #LLMs to describe the strengths and weaknesses of #preprints. Here are the results.
asapbio.org/interim-findings-f

The same fellows asked the same LLMs to ingest six preprints and their #PeerReviewed counterparts, and compare them for quality and rigor. Good question. But they've not yet analyzed the data and will presumably report soon.

PS: I'm interested in a related question. When LLMs answer research questions, do they treat on-topic preprints and on-topic postprints (peer-reviewed articles) as equivalent in weight or credibility? If not, how exactly do they take any differences into account?

#AI #PeerReview #ScholComm

How do we *want* #AI tools to treat #preprints?

(For a little background, see my post from late March.)
fediscience.org/@petersuber/11

Here's an unusual new case study. Scientists created a fake disease ("bixonimania"), uploaded two fake research papers about it to preprint servers, and monitored AI tools to see whether they fell for it. Several of the major tools did fall for it, at first, even if they later expressed doubts.
nature.com/articles/d41586-026

There were clues in the preprints that the research was fake. For example, the acknowledgments thanked a Starfleet Academy prof for her help and the Sideshow Bob Foundation for funding.

It's hard to avoid thinking that without those clues, humans might have fallen for those preprints too, at least at first. If we tested 100 human readers with different research backgrounds and purposes, the fall-for-it-at-first quotient might be 20, 40, or 60 rather than zero.

OK, AI tools don't get certain human jokes. That's shooting fish in a barrel. We still need to think about how AI tools ought to regard joke-free preprints.

More...
🧵

Human researchers bring a complicated mix of skepticism and trust to new research. They're ready to question everything. But in the absence of special clues, they don't suspect that authors or platforms might deliberately distribute fake info. When we train new researchers, we aim to increase their skepticism or critical attitude. When we fine-tune research platforms (their methods of peer review, data-sharing policies, business models, and so on), we aim to increase their trustworthiness, and hence to increase the trust granted to them by readers.

We want this tension. We wouldn't want to tilt too far toward platform trust at the expense of reader skepticism, or tilt too far toward reader skepticism at the expense of platform trust.

These are ways of saying that it's hard to decide how human researchers should treat joke-free preprints. It's at least as hard to decide how AI tools should do so. Meantime, of course, we should help them understand more clues to falsehood, deliberate and inadvertent.

Follow

@petersuber
Isn't the big problem with the BigAI systems that they produce text with a form that implies there has been thorough vetting, but they just don't have algorithms to produce reliability scores in the first place? The most common symptom of this being the "hallucination" event, where the system extrapolates from its data without giving any warnings it is replying beyond the known edges of its knowledge.

Meanwhile humans can be fooled indeed, and might occasionally bluff even in their area of expertise, but very rarely will the expert be both fooled by a paper and willing to bluff about its contents at the same time.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.