Links:
exiftool: https://www.exiftool.org/
qpdf: https://qpdf.sourceforge.io/
dangerzone (GUI, render PDF as images, then re-OCR everything): https://dangerzone.rocks/
mat2 (render PDF as images, don't OCR): https://0xacab.org/jvoisin/mat2
here's a shell script that recursively removes metadata from pdfs in a provided (or current) directory as described above. For mac/*nix-like computers, and you need to have qpdf and exiftool installed:
https://gist.github.com/sneakers-the-rat/172e8679b824a3871decd262ed3f59c6
The metadata appears to be preserved on papers from sci-hub. since it works by using harvested academic credentials to download papers, this would allow publishers to identify which accounts need to be closed/secured
https://twitter.com/json_dirs/status/1486135162505072641?t=Wg5XAzujycz79Cop_ap8vQ&s=19
@jonny I wonder whether uploading every paper to sci-hub twice would be feasible (i.e. would we still have enough people do that). (If we did so, then it would allow sci-hub to verify with reasonable certainty that whatever watermark-removal method they would use still works.)
@robryk
yes, definitely. all of the above. fix what you have now, adapt to changes, making double grabs part of the protocol makes sense :)