**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:30

**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:30

spinflip @spinflip@qoto.org

Aug 28, 2018, 11:30

A Python question regarding large file transfers over HTTP

I'm working on a project that involves retrieving large (~2-8 GB) .zip files through HTTP and storing them for later processing. I've written a script that uses an API to lookup and generate URLs for a series of needed files, and then attempts to stream each file to storage using requests.get().iter_content.

The problem is, my connection isn't perfectly stable (and I'm running this on a laptop which sometimes goes to sleep). When the connection is interrupted, the transfer dies and I need to restart it.

What would be the best way to add a resume capacity to my file transfer? So that if the script stalls or the connection drops, it would be possible to resume the download from where it failed?

**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:42

**spinflip** @spinflip@qoto.org · Aug 28, 2018, 11:42

Aug 28, 2018, 11:42

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

Here's a cleaned version of what I currently have: https://pastebin.com/Tpgqrvdi

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:26

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:26

Aug 29, 2018, 13:26

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip Did you get the resume working? It might be worth looking into Twisted and the ReconnectingClientFactory.

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 13:39

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 13:39

Aug 29, 2018, 13:39

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

@drewfer kind of: it worked for one file, then failed badly on another one and was still downloading and writing when the file on-disk was twice the size of the file being retrieved from the server...

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:52

**drewfer** @drewfer@qoto.org · Aug 29, 2018, 13:52

Aug 29, 2018, 13:52

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip can you share the code?

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 14:00

**spinflip** @spinflip@qoto.org · Aug 29, 2018, 14:00

Aug 29, 2018, 14:00

spinflip @spinflip@qoto.org

A Python question regarding large file transfers over HTTP

@drewfer Sure! here: https://pastebin.com/JmNG2s7B

**drewfer** @drewfer@qoto.org · 2018-08-29T14:52:18Z

drewfer @drewfer@qoto.org

A Python question regarding large file transfers over HTTP

@spinflip The tdqm_notebook is probably calling the request multiple times with the same range header. You're going to have to build an iterator around iter_content() that updates the Range header when the underlying connection closes and then pass that to the notebook.

Aug 29, 2018, 14:52 · · · ·

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…