Hey,
while testing a downloading script in python, I also wanted to check if Julia behaves similarly, but could not figure out how to do a script that:
- downloads a big file from the internet (>5GB)
- has a progress meter
- could download the file in chunks (preferably in parallel with multiple connections)
- writes to disk and not to memory (because size)
I have seen there are a couple of ways you could implement file downloading, with code from Base and from external packages, but couldn’t find a total match.
So I settled on a simple handwritten code that does some of the things:
using HTTP
using ProgressMeter
url = "http://images.cocodataset.org/zips/train2014.zip"
dst = path/to/file
open(dst, "w") do f
HTTP.open("GET", url) do io
r = startread(io)
l = parse(Int, HTTP.header(r, "Content-Length"))
p = Progress(l, 1)
dat = 0
while !eof(io)
write(f, read(io, 8192))
dat +=8192
update!(p,dat)
end
end
end
And this does the basic download well (mind you, the url is to a 12gb file I was troubleshooting, that fails to be downloaded beyond 4gb with a single connection, hence the multiple route I was looking for).
Also got something similar with Downloads.download
, but here I feel I have more control to alter things.
So, does anyone have a suggestion how to hit all the points mentioned? Do I have to do custom threaded loop with requesting data in ranges directly (the server allows for it)? Or maybe I missed some obvious package/function?
Thanks in advance!