which sounds like some problem with Tar waiting for the buffer to fill
Yes I thought there’s some short read happening here. If all
doesn’t work, I was hoping just calling read
again with the remaining size would be ok, but this doesn’t seem to work either.
The main trouble we’re having here is that Base.BufferStream
isn’t a public API so it’s not that well tested, some things like skip
are missing and the precise blocking behavior isn’t really documented. (I noticed something a bit weird about blocking — BufferStream
seems to only block on read
, not write
, so the internal buffer might be able to grow indefinitely. Which is fine in this case because the Tar reader is likely faster than the writer doing the download, but really not great if you were tarring and uploading!)
Another alternative is to use Pipe
, which is a publicly defined API and widely used for several things. IIUC the cost of blocking on a read or write to Pipe
will be a lot higher than BufferStream
because the Pipe
needs to go through the operating system kernel. But the pipe should have a fixed size buffer, so should block on both read or write side which should be a lot more sensible in general.
The following seemed to work for me:
using Tar, Downloads
function Base.skip(io::Union{Base.BufferStream,Pipe}, n)
if n > 0
while n > 0 && isopen(io)
buf = read(io, n)
n -= length(buf)
#if n > 0
# @info "Short read" length(buf)
#end
end
else
error("Can't skip backward in Pipe or BufferStream")
end
end
io = Pipe()
# Initialize the pipe. I'm not sure there's a public API for this ??
Base.link_pipe!(io)
# Alternatively, use BufferStream... which should work
# but seems to get stuck for some reason
# io = Base.BufferStream()
@sync begin
@async try
Downloads.download("https://data.proteindiffraction.org/ssgcid/3lls.tar", io)
@info "Download complete"
catch exc
@error "Caught exception" exc
finally
close(io)
end
@async try
loc = Tar.extract(x -> x.path == "3lls/series/200873f12_x0181.img.bz2" ? (@info("Extracting", x); true) : (@info("Ignoring", x); false), io)
@info "Untar complete" loc
catch exc
@error "Caught exception" exc
finally
close(io)
end
end
If you wanted to abort the download once the particular file of interest has been read and extracted, you may be able to do close(io)
.