It astonishes me that people expected LLMs to be good at creating summaries. LLMs are good at transforms that have the same shape as ones that appear in their training data. They're fairly good, for example, at generating comments from code because code follows common structures and naming conventions that are mirrored in the comments (with totally different shapes of text).

In contrast, summarisation is tightly coupled to meaning. Summarisation is not just about making text shorter, it's about discarding things that don't contribute to the overall point and combining related things. This is a problem that requires understanding the material, because it's all about making value judgements.

So, it's totally unsurprising that the Australian study showed that it's useless. It's no surprise that both Microsoft and Apple's email summarisation tools discard the obvious phishing markers and summarise phishing scams as '{important thing happened}, click on this link' because they don't actually understand anything in the text that they're discarding, they just mark it as low entropy and discard it.

@david_chisnall I'm surprised by your claim. Of course summarization is one of the training disciplines, so the models should be good at it. And the growing demand would drive more resources into it and the state of the art would be even better.

And the linked post is misleading. Testing (and even reporting) LLAMA-2 level models in mid 2024 as "most promising" is... meh. The title should be extended with "... in mice".

@dpwiz Why do you think including summarisation in the training data will make it good? Summaries depend on context. A load of examples of summaries will not make you good at summarisation unless you understand the meaning. You need to capture the nuance but discard ephemera and that’s not a property that you can discern solely from the text.

Also, I love the fact that so many people have started defending bullshit generators with ‘yes, last year’s ones were terrible, but this year’s ones do all of the things that we claimed last year’s ones did! Trust us this time!’

Follow

@david_chisnall Eh.. For the same reason it does *anything at all*?
I think the silliest of them all, the hand-written numbers recognition models, already does learn the "nuance" and inter-relationship between the pixel values while discarding the noise and unimportant variations.
What exactly is the "context" problem you think is intractable if not that?

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.