Hi,
I would like to extract and download files from a .zip-archive stored online (here) without having to download intermediate files. The .zip-archive consists of .txt and .json files and is nested.
I have found two references that work for .tar-files (on Discourse and reddit) but I cannot figure out how I would have to do it for .zip-folders. In the reddit post, the package UrlDownload.jl was suggested; this does not work in this case, returning multiple warnings: “Data format unknown is not supported.”
So far, I have been trying multiple variations along the lines of something like:
using HTTP, ZipFile
unzip_from_url(link, dir) = HTTP.open("GET", link) do io
zarchive = ZipFile.Reader(io)
for f in zarchive.files
FileName = split(f.name, "/")
DirName = joinpath(FileName[Not(end)]...)
FilePath = joinpath(dir, DirName)
if FileName[end] == ""
mkdir(FilePath)
else
mkpath(FilePath)
p = joinpath(dir, FileName...)
write(p, read(f))
end
end
close(zarchive)
end
The reason for why I don’t want to download the .zip-archive first and then extract it locally is that there are multiple such archives at the above url, each with around 700 MB and for a single year, and I don’t want to have to store both the zipped folders and the extracted ones.