Download decompressed .zip-archive from url

Hi,

I would like to extract and download files from a .zip-archive stored online (here) without having to download intermediate files. The .zip-archive consists of .txt and .json files and is nested.

I have found two references that work for .tar-files (on Discourse and reddit) but I cannot figure out how I would have to do it for .zip-folders. In the reddit post, the package UrlDownload.jl was suggested; this does not work in this case, returning multiple warnings: “Data format unknown is not supported.”

So far, I have been trying multiple variations along the lines of something like:

using HTTP, ZipFile

unzip_from_url(link, dir) = HTTP.open("GET", link) do io
    zarchive = ZipFile.Reader(io)
    for f in zarchive.files
        FileName = split(f.name, "/")
        DirName = joinpath(FileName[Not(end)]...)
        FilePath = joinpath(dir, DirName)
        if FileName[end] == ""
            mkdir(FilePath)
        else
            mkpath(FilePath)
            p = joinpath(dir, FileName...)
            write(p, read(f))
        end
    end
    close(zarchive)
end

The reason for why I don’t want to download the .zip-archive first and then extract it locally is that there are multiple such archives at the above url, each with around 700 MB and for a single year, and I don’t want to have to store both the zipped folders and the extracted ones.

You might benefit from ZipStreams.jl!