Download file with multiple connections

mkoculak · December 1, 2022, 6:18pm

Hey,
while testing a downloading script in python, I also wanted to check if Julia behaves similarly, but could not figure out how to do a script that:

downloads a big file from the internet (>5GB)
has a progress meter
could download the file in chunks (preferably in parallel with multiple connections)
writes to disk and not to memory (because size)

I have seen there are a couple of ways you could implement file downloading, with code from Base and from external packages, but couldn’t find a total match.

So I settled on a simple handwritten code that does some of the things:

using HTTP
using ProgressMeter


url = "http://images.cocodataset.org/zips/train2014.zip"
dst = path/to/file


open(dst, "w") do f
    HTTP.open("GET", url) do io
        r = startread(io)
        l = parse(Int, HTTP.header(r, "Content-Length"))
        p = Progress(l, 1)
        dat = 0
        while !eof(io)
            write(f, read(io, 8192))
            dat +=8192
            update!(p,dat)
        end
    end
end

And this does the basic download well (mind you, the url is to a 12gb file I was troubleshooting, that fails to be downloaded beyond 4gb with a single connection, hence the multiple route I was looking for).
Also got something similar with Downloads.download, but here I feel I have more control to alter things.

So, does anyone have a suggestion how to hit all the points mentioned? Do I have to do custom threaded loop with requesting data in ranges directly (the server allows for it)? Or maybe I missed some obvious package/function?
Thanks in advance!