Cross-platform unpacking of tar archive

I need to unpack uncompressed .tar archives from within Julia and have it work on all platforms.

Maybe a clue: there seems to be some code for this here within Pkg.PlatformEngines but I couldn’t figure out how to access it.

Or is there some other way? Here’s a small demo archive with 3 pages from the Julia docs (117 kB).

There’s a function in DataDeps I’ve used which does this https://github.com/oxinabox/DataDeps.jl/blob/master/src/post_fetch_helpers.jl

I use it in my package here for a zip file: https://github.com/aaowens/PSID.jl/blob/e4115cd0cd9cc0afc7fad1e64d82c808dd64e756/src/unzip_data.jl#L66

2 Likes

There is a pull request to make JuliaLang/Tar.jl a standard library. That package is new and not registered, but it sounds like it does what you want.

Works perfectly on my Windows 10 system, thanks! I’ll try it on a Linux box and a Mac tomorrow.

Thanks for reminding me. I found that too, but it didn’t work for me because it currently requires Julia 1.4.

Ah, I didn’t notice that, good point.

The function in DataDeps was originally from BinDeps.jl
(Or at least based on that)

It should work cross platform.

The tar format isn’t all that complicated and if you don’t care about various file metadata and just want the filenames and their contents, this quick and dirty hack should be good enough:

unpack_tar(filename::AbstractString) = open(load_tar, filename, "r")

function unpack_tar(file)
    unpacked_files = Dict{String, String}()

    while true
        record_start = position(file)
        filename = first(split(String(read(file, 100)), "\0"))
        isempty(filename) && break
        seek(file, record_start + 124)
        size = parse(Int, String(read(file, 11)), base = 8)
        seek(file, record_start + 512)
        unpacked_files[filename] = String(read(file, size))
        pos = position(file)
        seek(file, pos + mod(-pos, 512))
    end

    return unpacked_files
end

As a bonus, here’s a pure Julia solution for gzip compressed tar files:

using Inflate

unpack_tar_gz(filename) = unpack_tar(IOBuffer(inflate_gzip(filename)))
4 Likes

Tar.jl has been updated to support Julia 1.3 and now has CI setup for all the usual platforms (including Windows). It is still not registered, however, because we’re trying to figure out if it should be:

  1. A standard library
  2. A registered public package
  3. A private submodule of Pkg
  4. Both 2 & 3 with 3 being a vendored snapshot of 2.

But I consider the API to be stable and it’s pretty well tested (93% coverage!). Perhaps I should just register it and then we can vendor a snapshot whenever Pkg needs it.

3 Likes

I’ve opened a pull request to register Tar.jl v1.0.0:

https://github.com/JuliaRegistries/General/pull/6286

This will go through the usual 3-day waiting period, giving people a chance to chime in on the name and whatever else they might care to chime in on.

3 Likes