Artifacts for dummies?

I tried yesterday to use Artifacts to store some testing data for a package, and failed miserably.

First, I couldn’t find a clear indication on how to obtain the git-tree-sha1, or the sha256 that are required for the example.

Then, I found this thread, which points to the ArtifactsUtils.jl package. However, it seems that this package only accepts tarballs as the files (and even then, I was unable to use it after trying to tar my files in different ways). And, in any case, my files are not tarballs and that would just be an additional complication.

Thus, is anyone aware of a step-by-step tutorial on how to add a simple file (lets say, a simple text file hosted on a github repository) as an artifact to be used in a package?

Only tarballs are supported as artifacts, no way around it.

Ok, but then what’s the issue here? Any idea?

julia> using Artifacts, ArtifactUtils

julia> add_artifact!(
           "Artifacts.toml", 
           "nucleic_trajectory", 
           "https://github.com/m3g/TestingDataRepository/raw/refs/heads/main/ComplexMixtures/nucleic.tar", 
           force=true
       )
ERROR: This does not appear to be a TAR file/stream — invalid version string for tar file: "\f\xc1". Note: Tar.jl does not handle decompression; if the tarball is compressed you must use an external command like `gzcat` or package like CodecZlib.jl to decompress it. See the README file for examples.
Stacktrace:
...

Isn’t a tarball just a file for which I ran tar -cvf file.tar mydatafile.txt ?

ps: In this specific case the actual data file is a binary file, but I get the same error if the file in the tarball is just an ascii file:

julia> add_artifact!(
           "Artifacts.toml", 
           "nucleic_trajectory", 
           "https://github.com/m3g/TestingDataRepository/raw/refs/heads/main/ComplexMixtures/nucleic/solvated.pdb.tar", 
           force=true
       )
ERROR: This does not appear to be a TAR file/stream — invalid version string for tar file: " O". Note: Tar.jl does not handle decompression; if the tarball is compressed you must use an external command like `gzcat` or package like CodecZlib.jl to decompress it. See the README file for examples.
Stacktrace:

Seems that the file has to be compressed?

I was confused by the fact that the message effectively appeared to say that if the file was compressed I would need external tools to handle it.

Care sharing the stacktrace? Pkg.jl does support uncompressed tarballs:

but I don’t know where the error you found is coming from, since you kept the stacktrace away.

The confusing part is that the error message appears to suggest imply that one should not use a compressed file:

stack trace
julia> add_artifact!(
           "Artifacts.toml", 
           "pdb", 
           "https://github.com/m3g/TestingDataRepository/raw/refs/heads/main/ComplexMixtures/nucleic/solvated.pdb.tar", 
           force=true
       )
ERROR: This does not appear to be a TAR file/stream — invalid version string for tar file: " O". Note: Tar.jl does not handle decompression; if the tarball is compressed you must use an external command like `gzcat` or package like CodecZlib.jl to decompress it. See the README file for examples.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] |>(x::String, f::typeof(error))
    @ Base ./operators.jl:926
  [3] header_error(buf::Vector{UInt8}, msg::String)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:563
  [4] check_version_field
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:623 [inlined]
  [5] read_standard_header(io::Base.Process; buf::Vector{UInt8}, tee::Base.DevNull)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:594
  [6] read_standard_header
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:578 [inlined]
  [7] #read_header#48
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:423 [inlined]
  [8] read_header
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:417 [inlined]
  [9] read_tarball(callback::Tar.var"#26#28"{…}, predicate::Tar.var"#1#2", tar::Base.Process; buf::Vector{…}, skeleton::Base.DevNull)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:353
 [10] extract_tarball(predicate::Function, tar::Base.Process, root::String; buf::Vector{…}, skeleton::Base.DevNull, copy_symlinks::Bool, set_permissions::Bool)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:72
 [11] extract_tarball
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/extract.jl:62 [inlined]
 [12] #83
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:249 [inlined]
 [13] arg_write(f::Tar.var"#83#86"{String, Bool, Base.Process, Bool, Tar.var"#1#2"}, arg::Base.DevNull)
    @ ArgTools ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/ArgTools/src/ArgTools.jl:134
 [14] (::Tar.var"#82#85"{Base.Process, Base.DevNull, Bool, Tar.var"#1#2"})(dir::String)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:248
 [15] arg_mkdir(f::Tar.var"#82#85"{Base.Process, Base.DevNull, Bool, Tar.var"#1#2"}, arg::String)
    @ ArgTools ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/ArgTools/src/ArgTools.jl:185
 [16] #81
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:243 [inlined]
 [17] open(::Tar.var"#81#84"{Base.DevNull, Bool, Tar.var"#1#2", String}, ::Cmd; kwargs::@Kwargs{})
    @ Base ./process.jl:447
 [18] open(::Function, ::Cmd)
    @ Base ./process.jl:428
 [19] arg_read
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/ArgTools/src/ArgTools.jl:75 [inlined]
 [20] extract(predicate::Function, tarball::Cmd, dir::String; skeleton::Nothing, copy_symlinks::Nothing, set_permissions::Bool)
    @ Tar ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:242
 [21] extract
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:229 [inlined]
 [22] #extract#87
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:268 [inlined]
 [23] extract
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Tar/src/Tar.jl:261 [inlined]
 [24] unpack(tarball_path::String, dest::String; verbose::Bool)
    @ Pkg.PlatformEngines ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/PlatformEngines.jl:407
 [25] unpack
    @ ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/PlatformEngines.jl:402 [inlined]
 [26] #25
    @ ~/.julia/packages/ArtifactUtils/OB0c7/src/ArtifactUtils.jl:66 [inlined]
 [27] create_artifact(f::ArtifactUtils.var"#25#26"{String})
    @ Pkg.Artifacts ~/.julia/juliaup/julia-1.11.1+0.x64.linux.gnu/share/julia/stdlib/v1.11/Pkg/src/Artifacts.jl:39
 [28] add_artifact!(artifacts_toml::String, name::String, tarball_url::String; clear::Bool, options::@Kwargs{force::Bool})
    @ ArtifactUtils ~/.julia/packages/ArtifactUtils/OB0c7/src/ArtifactUtils.jl:65
 [29] top-level scope
    @ REPL[27]:1
Some type information was truncated. Use `show(err)` to see complete types.

julia> 

Ok. but moving forward:

With the compressed file I can get the Artifact to work (although I have to run

julia> Pkg.ensure_artifact_installed("pdb", "./Artifacts.toml")

at the current point - not sure if this will be need in practice later).

The artifact then accessed with artifact"pdb", but it is (as fair as I understand at this point) the .tar.gz file. I have to then use some external tool uncompress it locally, or am I missing something?

edit: Oh, no, it is a folder and the file is in there!

If a package uses a series of artifacts for testing purposes only (and doc tests), where’s the proper place to put the Artifacts.toml file?

(I’m understanding that if it is in the main directory of the package - where the Project.toml of the package is - the artifacts will be downloaded upon package installation, or won’t they?)

Lazy artifacts are downloaded on demand the first time they’re requested. If you don’t use them anywhere in the source code of your package, but only inside tests or docs, then they won’t be downloaded upon installation of your package.