How to plumb together download -> uncompress -> untar without writing full downloaded file

I think you’ll want some @async here because you want the decoding to overlap in time with the download. (In principle it’s possible to avoid the @async if you implement a custom stream to pass to download. This custom stream would need to drive the selection of your desired files out of the archive from within that custom stream’s write() method. But this seems fairly messy, unless someone happens to have implemented that already.)

For an async solution, you want a buffered blocking stream instead. There is one such stream in the undocumented Base.BufferStream which can be used for this purpose:

import Downloads, Tar, 
using CodecZlib

# Some necessary piracy - BufferStream doesn't have
# an implementation for `Base.skip()`
function Base.skip(io::Base.BufferStream, n)
    if n > 0
        read(io, n)
    else
        error("Can't skip backward in BufferStream")
    end
end

io = Base.BufferStream()
@sync begin
    @async begin
        Downloads.download("file://localhost/home/chris/tmp/testdata.tgz", io)
        @info "Download complete"
        close(io)
    end
    @async begin
        loc = Tar.extract(x -> x.path == "1.dat" ? (@info("Extracting", x); true) : (@info("Ignoring", x); false), GzipDecompressorStream(io))
        @info "Untar complete" loc
    end
end

Testing, we can see that extraction happens in a streaming manner:

┌ Info: Ignoring
└   x = Tar.Header("10.dat", :file, 0o644, 100000000, "")
┌ Info: Extracting
└   x = Tar.Header("1.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("2.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("3.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("4.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("5.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("6.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("7.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("8.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└   x = Tar.Header("9.dat", :file, 0o644, 100000000, "")
[ Info: Download complete
┌ Info: Untar complete
└   loc = "/tmp/jl_p3kvwB"
6 Likes