I think you’ll want some @async
here because you want the decoding to overlap in time with the download. (In principle it’s possible to avoid the @async
if you implement a custom stream to pass to download
. This custom stream would need to drive the selection of your desired files out of the archive from within that custom stream’s write()
method. But this seems fairly messy, unless someone happens to have implemented that already.)
For an async solution, you want a buffered blocking stream instead. There is one such stream in the undocumented Base.BufferStream
which can be used for this purpose:
import Downloads, Tar,
using CodecZlib
# Some necessary piracy - BufferStream doesn't have
# an implementation for `Base.skip()`
function Base.skip(io::Base.BufferStream, n)
if n > 0
read(io, n)
error("Can't skip backward in BufferStream")
io = Base.BufferStream()
@sync begin
@async begin
Downloads.download("file://localhost/home/chris/tmp/testdata.tgz", io)
@info "Download complete"
@async begin
loc = Tar.extract(x -> x.path == "1.dat" ? (@info("Extracting", x); true) : (@info("Ignoring", x); false), GzipDecompressorStream(io))
@info "Untar complete" loc
Testing, we can see that extraction happens in a streaming manner:
┌ Info: Ignoring
└ x = Tar.Header("10.dat", :file, 0o644, 100000000, "")
┌ Info: Extracting
└ x = Tar.Header("1.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("2.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("3.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("4.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("5.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("6.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("7.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("8.dat", :file, 0o644, 100000000, "")
┌ Info: Ignoring
└ x = Tar.Header("9.dat", :file, 0o644, 100000000, "")
[ Info: Download complete
┌ Info: Untar complete
└ loc = "/tmp/jl_p3kvwB"