Using ArtifactUtils add_artifact throws IOError

Hi All,

I’m trying to build a package that includes data as an Artifact. Following ArtifactUtils · Julia Packages, the following worked as indicated:

using ArtifactUtils, Artifacts # Artifacts provides the artifact string macro

add_artifact!(
    "Artifacts.toml",
    "JuliaMono",
    "https://github.com/cormullion/juliamono/releases/download/v0.030/JuliaMono.tar.gz",
    force=true,
    )

But, the following throws an error:

add_artifact!(
    "Artifacts.toml",
    "intcal20",
    "https://github.com/wccarleton/intcal20/raw/main/intcal20.tar.gz",
    force=true,
    )

The error:

ERROR: IOError: open("/home/ice.mpg.de/wcarleton/.julia/artifacts/jl_uVK3Mj/intcal20.csv", 0, 0): no such file or directory (ENOENT)
Stacktrace:
  [1] uv_error
    @ ./libuv.jl:97 [inlined]
  [2] open(path::String, flags::UInt8, mode::Int64)
    @ Base.Filesystem ./filesystem.jl:106
  [3] open
    @ ./filesystem.jl:98 [inlined]
  [4] sendfile(src::String, dst::String)
    @ Base.Filesystem ./file.jl:978
  [5] cp(src::String, dst::String; force::Bool, follow_symlinks::Bool)
    @ Base.Filesystem ./file.jl:370
  [6] cp
    @ ./file.jl:364 [inlined]
  [7] (::Tar.var"#26#28"{Vector{UInt8}, Bool, Bool, Base.Process, String})(hdr::Tar.Header, parts::Vector{SubString{String}})
    @ Tar ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/extract.jl:79
  [8] read_tarball(callback::Tar.var"#26#28"{Vector{UInt8}, Bool, Bool, Base.Process, String}, predicate::Tar.var"#1#2", tar::Base.Process; buf::Vector{UInt8}, skeleton::Base.DevNull)
    @ Tar ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/extract.jl:399
  [9] extract_tarball(predicate::Function, tar::Base.Process, root::String; buf::Vector{UInt8}, skeleton::Base.DevNull, copy_symlinks::Bool, set_permissions::Bool)
    @ Tar ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/extract.jl:58
 [10] (::Tar.var"#85#88"{String, Base.Process, Bool, Tar.var"#1#2"})(skeleton::Base.DevNull)
    @ Tar ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/Tar.jl:237
 [11] arg_write(f::Tar.var"#85#88"{String, Base.Process, Bool, Tar.var"#1#2"}, arg::Base.DevNull)
    @ ArgTools ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:112
 [12] #84
    @ ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/Tar.jl:236 [inlined]
 [13] arg_mkdir(f::Tar.var"#84#87"{Base.Process, Base.DevNull, Bool, Tar.var"#1#2"}, arg::String)
    @ ArgTools ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:163
 [14] #83
    @ ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/Tar.jl:232 [inlined]
 [15] open(::Tar.var"#83#86"{Base.DevNull, Bool, Tar.var"#1#2", String}, ::Cmd; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./process.jl:395
 [16] open(::Function, ::Cmd)
    @ Base ./process.jl:393
 [17] arg_read
    @ ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:60 [inlined]
 [18] extract(predicate::Function, tarball::Cmd, dir::String; skeleton::Nothing, copy_symlinks::Nothing, set_permissions::Bool)
    @ Tar ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/Tar.jl:231
 [19] #extract#89
    @ ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Tar/src/Tar.jl:255 [inlined]
 [20] unpack(tarball_path::String, dest::String; verbose::Bool)
    @ Pkg.PlatformEngines ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Pkg/src/PlatformEngines.jl:386
 [21] unpack
    @ ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Pkg/src/PlatformEngines.jl:386 [inlined]
 [22] #25
    @ ~/.julia/packages/ArtifactUtils/vpjlQ/src/ArtifactUtils.jl:66 [inlined]
 [23] create_artifact(f::ArtifactUtils.var"#25#26"{String})
    @ Pkg.Artifacts ~/Julia/julia-1.7.1/share/julia/stdlib/v1.7/Pkg/src/Artifacts.jl:44
 [24] add_artifact!(artifacts_toml::String, name::String, tarball_url::String; clear::Bool, options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:force,), Tuple{Bool}}})
    @ ArtifactUtils ~/.julia/packages/ArtifactUtils/vpjlQ/src/ArtifactUtils.jl:65
 [25] top-level scope
    @ REPL[7]:1

Any ideas would be greatly appreciated!

This seems to be a problem with the tarball itself. If you look at its content, it seems to also contain a symlink with the name intcal20.csv pointing to a file with exactly the same name:

shell> tar -ztvf /var/folders/r3/nxssh39n4p14zwshq19156p40000gp/T/jl_tTywbQ
-rwxrwxrwx  0 root   root   220882 Feb  1 03:54 ./intcal20.csv
hrwxrwxrwx  0 root   root        0 Feb  1 03:54 intcal20.csv link to ./intcal20.csv

@StefanKarpinski Would it be possible for Tar.jl to detect such cases and throw a better error?

I’ll look into it. It shouldn’t error at all. That’s a valid tarball structure that it should just unpack, resulting in a broken self-link. Issue: https://github.com/JuliaIO/Tar.jl/issues/127. Of course the resulting file tree is useless but Tar should handle it.

2 Likes

I’m still not sure how you can end-up creating that tarball, but what about skipping the self-link when extracting? The other file is usable. For example, when I open the tarball with Ark it shows only the actual CSV file.


I’m still not sure how you can end-up creating that tarball

This is how far I’ve got:

# Create first file
echo 'hello world' > foo
# Put it in the tarball, including the `./` part
tar cf test.tar ./foo
# Remove the file and create the link
rm foo
ln -s ./foo foo
# Append to the tarball
tar rf test.tar foo
# See the result
tar tvf test.tar

But this creates a symbolic link, that tarball has a hard self-link, I still haven’t figured out how to create it. Impressive feat.

1 Like

Hi All, many thanks for the thoughts and effort.

I created the tarball using Ubuntu 18.04 tar command-line utility:

tar -czvf output.tar.gz source.csv

It’s entirely possibly I’ve done something out of the ordinary since I never use tar archives or the utility myself (other than for extracting when needed).

Does the snippet above point to the problem?

No, that in itself is completely reasonable. Could you show the output of running ls -al inside that folder?

Sure thing. Inside the folder containing the original csv and the tarball:

.../IntCal20$ ls -al
total 292
drwxrwxrwx 1 root root      0 Feb  1 09:56 .
drwxrwxrwx 1 root root      0 Feb  1 09:54 ..
-rwxrwxrwx 1 root root 220882 Feb  1 09:54 intcal20.csv
-rwxrwxrwx 1 root root  73779 Feb  1 09:56 intcal20.tar.gz

Can you provide the wonky tarball for debugging purposes? In attempting to reproduce this, I find that extracting a tarball with a self-referencing symlink already works fine, so something else must be going on here.

Ah, so you think the tarball has a hardlink entry of a file to itself? Yeah, that is interesting. Easy to create with Tar.jl, actually:

julia> using Tar

julia> tarball, io = mktemp()
("/var/folders/4g/b8p546px3nd550b3k288mhp80000gp/T/jl_O3ejuB", IOStream(<fd 21>))

julia> Tar.write_header(io, Tar.Header("path", :file, 0o755, 0, ""))
512

julia> Tar.write_header(io, Tar.Header("path", :hardlink, 0o755, 0, "path"))
512

julia> close(io)

julia> Tar.list(tarball)
2-element Vector{Tar.Header}:
 Tar.Header("path", :file, 0o755, 0, "")
 Tar.Header("path", :hardlink, 0o755, 0, "path")

And indeed, trying to extract this monster does reproduce the error reported above:

julia> Tar.extract(tarball)
ERROR: IOError: open("/var/folders/4g/b8p546px3nd550b3k288mhp80000gp/T/jl_wtk4jf/path", 0, 0): no such file or directory (ENOENT)
Stacktrace:
  [1] uv_error
    @ ./libuv.jl:97 [inlined]
  [2] open(path::String, flags::UInt16, mode::Int64)
    @ Base.Filesystem ./filesystem.jl:106
  [3] open
    @ ./filesystem.jl:98 [inlined]
  [4] sendfile(src::String, dst::String)
    @ Base.Filesystem ./file.jl:978
  [5] cp(src::String, dst::String; force::Bool, follow_symlinks::Bool)
    @ Base.Filesystem ./file.jl:370
  [6] cp
    @ ./file.jl:364 [inlined]
  [7] (::Tar.var"#26#28"{Vector{UInt8}, Bool, Bool, IOStream, String})(hdr::Tar.Header, parts::Vector{SubString{String}})
    @ Tar ~/dev/Tar/src/extract.jl:79
  [8] read_tarball(callback::Tar.var"#26#28"{Vector{UInt8}, Bool, Bool, IOStream, String}, predicate::Tar.var"#1#2", tar::IOStream; buf::Vector{UInt8}, skeleton::Base.DevNull)
    @ Tar ~/dev/Tar/src/extract.jl:399
  [9] extract_tarball(predicate::Function, tar::IOStream, root::String; buf::Vector{UInt8}, skeleton::Base.DevNull, copy_symlinks::Bool, set_permissions::Bool)
    @ Tar ~/dev/Tar/src/extract.jl:58
 [10] (::Tar.var"#85#88"{String, IOStream, Bool, Tar.var"#1#2"})(skeleton::Base.DevNull)
    @ Tar ~/dev/Tar/src/Tar.jl:237
 [11] arg_write(f::Tar.var"#85#88"{String, IOStream, Bool, Tar.var"#1#2"}, arg::Base.DevNull)
    @ ArgTools ~/.julia/juliaup/julia-1.7.1+0~x64/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:112
 [12] #84
    @ ~/dev/Tar/src/Tar.jl:236 [inlined]
 [13] arg_mkdir(f::Tar.var"#84#87"{IOStream, Base.DevNull, Bool, Tar.var"#1#2"}, arg::Nothing)
    @ ArgTools ~/.julia/juliaup/julia-1.7.1+0~x64/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:163
 [14] #83
    @ ~/dev/Tar/src/Tar.jl:232 [inlined]
 [15] open(f::Tar.var"#83#86"{Base.DevNull, Bool, Tar.var"#1#2", Nothing}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Base ./io.jl:330
 [16] open
    @ ./io.jl:328 [inlined]
 [17] arg_read
    @ ~/.julia/juliaup/julia-1.7.1+0~x64/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:60 [inlined]
 [18] #extract#82
    @ ~/dev/Tar/src/Tar.jl:231 [inlined]
 [19] #extract#89
    @ ~/dev/Tar/src/Tar.jl:255 [inlined]
 [20] extract (repeats 2 times)
    @ ~/dev/Tar/src/Tar.jl:255 [inlined]
 [21] top-level scope
    @ REPL[21]:1

While I do think this could be handled better, it’s a pretty strange tarball.

4 Likes

This is fixed now on Tar.jl master (and making its way to Julia master). I haven’t made a release of Tar with this yet because this is a pretty niche “feature” and I prefer to let things simmer before cutting a release (in case there’s issues I didn’t catch). As is somehow always the case, the change was more involved than expected, but I think that’s because I always end up expanding the test coverage when I fix things or add features. I considered just making this an error since it’s such an odd thing for a tarball to do, but I explained my reasoning for supporting it in the commit message:

This is a weird construction: an entry for a file followed by a hardlink with that file as its target and with the same path. The hardlink copies the content of its target and creates a file with the “new” path and mode, which happens in this case to be the same path as the original file. The effect of this construction is to change the permissions of a previous file entry. We could diallow this since it’s odd, but based on the bug report it is something tar can generate and the other tarball reading methods like tree_hash and rewrite already just work because the logic is simply to copy the contents of an existing node with a new path and mode and for a sane tree data structure, you can just overwrite an arbitrary node. The file system is wonkier and the change in logic here is merely to ensure that the old file isn’t deleted too early.

2 Likes

This is the one:

https://github.com/wccarleton/intcal20/raw/main/intcal20.tar.gz

I hope that helps.

1 Like

I (and I’m sure others) appreciate all the work you’ve put in. I still don’t understand what’s gone on really and I’d rather not make odd tarballs anyway. So, if anyone has any tips for figuring out what’s odd about my system—if anything—or even where to start, I’d appreciate hearing them (though, it’s well beyond Julia concerns, so please don’t feel obligated). I’ll start learning more about tar as a first step, I think.

1 Like

The trouble is that the command-line that you gave doesn’t, for me at least, reproduce this weird tarball, so I have no idea what you did to produce it or what to tell you to avoid. The thing you said you did is fine and produces a normal tarball with just a single file entry:

julia> using Downloads, Tar

julia> tgz = Downloads.download("https://github.com/wccarleton/intcal20/raw/main/intcal20.tar.gz")
"/var/folders/4g/b8p546px3nd550b3k288mhp80000gp/T/jl_a9ZAb24bzm"

julia> Tar.list(`gzcat $tgz`)
2-element Vector{Tar.Header}:
 Tar.Header("./intcal20.csv", :file, 0o777, 220882, "")
 Tar.Header("intcal20.csv", :hardlink, 0o777, 0, "./intcal20.csv")

julia> dir = Tar.extract(`gzcat $tgz`)
"/var/folders/4g/b8p546px3nd550b3k288mhp80000gp/T/jl_7AGjMQ"

julia> cd(dir)

shell> ls -l
total 216
-rwxr-xr-x 1 stefan staff 220882 Feb 11 09:51 intcal20.csv

shell> gtar -czvf output.tar.gz intcal20.csv
intcal20.csv

shell> ls -l
total 288
-rwxr-xr-x 1 stefan staff 220882 Feb 11 09:51 intcal20.csv
-rw-r--r-- 1 stefan staff  73132 Feb 11 09:52 output.tar.gz

shell>

julia> Tar.list(`gzcat output.tar.gz`)
1-element Vector{Tar.Header}:
 Tar.Header("intcal20.csv", :file, 0o755, 220882, "")

What I did there was download your tarball, list its contents so you can see the self hardlink entry, then run the GNU tar command you posted to create a new tarball in that directory, and then list its contents so you can see that there’s no self hardlink entry. I was able to create a tarball with a self-hardlink using the internal plumbing of Tar, but I have not figured out how to get GNU tar to make a tarball like this, so I have no idea what to tell you not to do.

1 Like

Aha! I figured it out. This is how you do it:

shell> gtar -czvf output.tar.gz ./intcal20.csv intcal20.csv
./intcal20.csv
intcal20.csv

julia> Tar.list(`gzcat output.tar.gz`)
2-element Vector{Tar.Header}:
 Tar.Header("./intcal20.csv", :file, 0o755, 220882, "")
 Tar.Header("intcal20.csv", :hardlink, 0o755, 0, "./intcal20.csv")

Note that the same file is added to the archive as both ./intcal20.csv and intcal20.csv, i.e. with and without the leading ./. Why does this have the effect we’re seeing? Because of a couple of facts about tar:

  1. It does not deduplicate entries that you put in the archive at all: if you list something twice, there will be two entries in the archive for it.

  2. It automatically “preserves” hard links: it only includes the first copy of each inode as a file entry; if the same inode is included later, it’s recorded as a hardlink to the first entry for that inode.

So what you probably did here was to accidentally add the same file to the archive twice. If you only include each file once, it won’t happen. What tipped me off was the fact that the two entries had different paths, but it turns out that isn’t even necessary:

shell> gtar -czvf output.tar.gz intcal20.csv intcal20.csv
intcal20.csv
intcal20.csv

julia> Tar.list(`gzcat output.tar.gz`)
2-element Vector{Tar.Header}:
 Tar.Header("intcal20.csv", :file, 0o755, 220882, "")
 Tar.Header("intcal20.csv", :hardlink, 0o755, 0, "intcal20.csv")

This created two entries with the same exact path in the archive. This is so easy to do that I’m a little surprised we’ve not seen it before and it’s all the more reason to handle it correctly rather than throwing an error. Thank you for the bug report!

6 Likes

That is quite likely what I did. Overly speedy editing on bash history in the terminal (up key, then changing commands, copy-n-paste, etc) without noticing the duplicate. Nice detective work!

4 Likes