**spinflip** @spinflip@qoto.org · 2018-08-28T11:30:00Z

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

I'm working on a project that involves retrieving large (~2-8 GB) .zip files through HTTP and storing them for later processing. I've written a script that uses an API to lookup and generate URLs for a series of needed files, and then attempts to stream each file to storage using requests.get().iter_content.

The problem is, my connection isn't perfectly stable (and I'm running this on a laptop which sometimes goes to sleep). When the connection is interrupted, the transfer dies and I need to restart it.

What would be the best way to add a resume capacity to my file transfer? So that if the script stalls or the connection drops, it would be possible to resume the download from where it failed?

Aug 28, 2018, 11:30 · · · ·

**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:42

**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:42

Aug 28, 2018, 11:42

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

Here's a cleaned version of what I currently have: https://pastebin.com/Tpgqrvdi

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:26

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:26

Aug 29, 2018, 13:26

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip Did you get the resume working? It might be worth looking into Twisted and the ReconnectingClientFactory.

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 13:39

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 13:39

Aug 29, 2018, 13:39

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

@drewfer kind of: it worked for one file, then failed badly on another one and was still downloading and writing when the file on-disk was twice the size of the file being retrieved from the server...

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:52

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:52

Aug 29, 2018, 13:52

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip can you share the code?

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 14:00

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 14:00

Aug 29, 2018, 14:00

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

@drewfer Sure! here: https://pastebin.com/JmNG2s7B

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 14:52

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 14:52

Aug 29, 2018, 14:52

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip The tdqm_notebook is probably calling the request multiple times with the same range header. You're going to have to build an iterator around iter_content() that updates the Range header when the underlying connection closes and then pass that to the notebook.

**ComPhys** @comphys@qoto.org · Aug 28, 2018, 12:32

**ComPhys** @comphys@qoto.org · Aug 28, 2018, 12:32

Aug 28, 2018, 12:32

ComPhys @comphys@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip Hi, I don't have a direct answer to your question; I've never tried to do this before. However the problem makes me think of mosh ( https://mosh.org/ ) which is an ssh alternative specifically developed for intermittent connections and shoop which is an scp alternative. Perhaps these could be of use if the normal http method turns out to be difficult.