Just tried q.e.d. by @odedrechavi.bsky.social et al. with a few papers including by myself & others where I knew a claim within was flawed based on a misunderstanding of the signal.
1) it was impressive. I see what the hype is about.
2) it hallucinated.
www.qedscience.com
Overly long #SciPub🧵 1/n
q.e.d Science
But I want to stress most of all: qed is very impressive. Reviewers are overworked (direct.mit.edu/qss/article/...). qed is easily up to the task of providing article assessment, and does as good a job or better than most people would at review, but in a miniscule fraction of the time. Wow. 7/n
@hansonmark.bsky.social Haven't tried qed but that's my general feeling when asking an LLM to comment on any piece of work. Ok average critique, but you need to work to make it good and really useful.
There is, however a moral issue here. I'm happy to upload my own unpublished manuscript and accept any risk associated with that (eg data leakage), however the authors of a paper I'm reviewing might not want that...
Also, while I do appreciate that "Our Al providers are contractually barred from training their own foundation models on your data.", I can't see anywhere who these providers are. In general AI companies don't have a good track record with regards to privacy matters. Also how would anyone find out whether they instead use the data for training?