How to download an entire website for fun and profit:

wget --recursive --page-requisites --html-extension --convert-links --restrict-file-names=unix --no-clobber --domains <domain.com> <starting.point.domain.com>

@cafkafk Have you figured out a way to make that resumable?

@robryk --no-clobber makes it so no files are overwritten if you restart it if the connection breaks

also you should be able to use wget -c or wget --continue, but I never actually used those

@cafkafk Will links from already-fetched files be followed if I use -nc? I once got the empirical impression that that wasn't happening and the manual was very unclear.

@robryk I'm pretty sure it will re-read html and htm file that have been downloaded if resumed when using -nc

> Note that when -nc is specified, files with the suffixes .html or .htm will be loaded from the local disk and parsed as if they had been retrieved from the Web.
from man page
Follow

@cafkafk Aaaah, so `--html-extension` is crucial here (and I was very bad at reading the manpage). Thanks.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.