@tdietterich yes, LLMs certainly help in citing references that have nothing to do with the topic of the paper... if these references exist at all.
Or did you mean that the most recent cause of the problem could be part of its solution?
@tdietterich I am looking forward to hear someone respond positively to your call, because I'm overly sceptical about the reliability of such assessments made by language models.
Of course it depends on what kind of "matches" you're after. For example at this stage, I tend to think different approaches are necessary for explicit vs implicit references. For the former it appears that LLMs are less appropriate than smaller bespoke models, for the latter it seems that across the board LLMs are ineffective - the level of sophistication of the related thought and language is way beyond what GenAI can do.
The starting point of these ideas comes from a couple of works (mine and of others) an early version of which were presented at this venue: https://aclanthology.org/volumes/2023.nlp4dh-1/ . Extended versions are due to appear here: https://jdmdh.episciences.org/volume/view/id/593 It's all work in progress though.
@mapto Thanks! I'm particularly interested in flagging suspicious submissions to arXiv. Some false positive flags are ok -- human moderators will review them all. But I would like to minimize false negatives.