Creating Artifacts.toml for existing tarball

I’m trying to figure out how to create an Artifacts.toml to download some data that my package depends on. I’m creating a setup_artifacts.jl script to create it.

Here’s what I have so far:

using Pkg.Artifacts
using Pkg.GitTools
using Tar

toml = joinpath(@__DIR__, "Artifacts.toml")
ir_url = "https://tidmarsh.media.mit.edu/~sfr/tidmarsh_irs.tar.gz"

ir_hash = create_artifact() do artifact_dir
    tarball = download(ir_url)
    try
        global tarball_hash = bytes2hex(GitTools.blob_hash(tarball))
        Tar.extract(tarball, artifact_dir)
    finally
        rm(tarball)
    end
end

bind_artifact!(toml, "tidmarsh_irs", ir_hash;
               download_info=[(ir_url, tarball_hash)],
               lazy=true)

My goal here is to have an Artifacts.toml file that I can include with my package that will download the referenced file and unpack it, making the contents (a bunch of .flac files in my case) available at runtime.

Currently Tar.extract is choking on the tarball (“invalid octal digit: G”), perhaps because it’s compressed?

Also - is this the right way to be doing this? I’m trying to adapt the blog post here but it doesn’t seem like the examples set the URL that the artifacts can be downloaded from.

2 Likes

Ah, it looks like instead of using Tar.jl I can use unpack from Pkg.PlatformEngines.

So my script looks like:

using Pkg.Artifacts
using Pkg.GitTools
using Pkg.PlatformEngines

toml = joinpath(@__DIR__, "Artifacts.toml")
ir_url = "https://tidmarsh.media.mit.edu/~sfr/tidmarsh_irs.tar.gz"

ir_hash = create_artifact() do artifact_dir
    tarball = download(ir_url)
    @show artifact_dir
    try
        global tarball_hash = bytes2hex(GitTools.blob_hash(tarball))
        unpack(tarball, artifact_dir)
    finally
        rm(tarball)
    end
end

bind_artifact!(toml, "tidmarsh_irs", ir_hash;
               download_info=[(ir_url, tarball_hash)],
               lazy=true,
               force=true)

The idea would be that I would run this script if my data ever changed, to update the Artifacts.toml.

@dmbates maybe this helps with the question you asked in the other thread?

3 Likes

Thanks for the suggestion. I am still struggling to understand the section with create_artifact, even after reading the documentation and seeing examples. I tried to run this code and was unsuccessful.

julia> using Pkg.Artifacts

julia> using Pkg.GitTools

julia> using Pkg.PlatformEngines

julia> toml = joinpath(@__DIR__, "Artifacts.toml")
"/tmp/Artifacts.toml"

julia> ir_url = "https://tidmarsh.media.mit.edu/~sfr/tidmarsh_irs.tar.gz"
"https://tidmarsh.media.mit.edu/~sfr/tidmarsh_irs.tar.gz"

julia> ir_hash = create_artifact() do artifact_dir
           tarball = download(ir_url)
           @show artifact_dir
           try
               global tarball_hash = bytes2hex(GitTools.blob_hash(tarball))
               unpack(tarball, artifact_dir)
           finally
               rm(tarball)
           end
       end
artifact_dir = "/home/bates/.julia/artifacts/jl_Oj8ETe"
ERROR: MethodError: no method matching (::Pkg.PlatformEngines.var"#3#5")(::String, ::String, ::Nothing)
Closest candidates are:
  #3(::AbstractString, ::AbstractString; excludelist) at /home/bates/src/julia-1.3.1/share/julia/stdlib/v1.3/Pkg/src/PlatformEngines.jl:41
Stacktrace:
 [1] #unpack#91(::Bool, ::typeof(unpack), ::String, ::String) at /home/bates/src/julia-1.3.1/lib/julia/sys.so:?
 [2] unpack at /home/bates/src/julia-1.3.1/share/julia/stdlib/v1.3/Pkg/src/PlatformEngines.jl:723 [inlined]
 [3] (::var"#3#4")(::String) at ./REPL[6]:6
 [4] create_artifact(::var"#3#4") at /home/bates/src/julia-1.3.1/share/julia/stdlib/v1.3/Pkg/src/Artifacts.jl:213
 [5] top-level scope at REPL[6]:1

I think you’re seeing a bug. I filed an issue.

The main point of create_artifact() is to compute the hash of the artifact contents. It creates a directory and passes it to the function (artifact_dir here). You then fill that directory with whatever you want the contents to be, and after your function returns it will compute the hash. In my script here get the files by downloading a tarball and unpacking into the given directory. I also need to compute the hash of the tarball itself, which I do manually and save it into tarball_hash. Then when I call bind_artifact! I’m actually writing to the Artifacts.toml file. The download_info list needs the tarball hash, which I assume it validates after downloading.

Arguably this script uses some things that might be considered internal to Pkg, but I’m not sure if there’s a better way. @staticfloat would you mind weighing in on whether this is the right way to add a tarball as an artifact?

Tar will eventually replace Pkg.PlatformEnginee, but not yet. Probably in Julia 1.5.

1 Like

While Spencer did find a small bug, that’s not the root cause here; you need to call Pkg.PlatformEngines.probe_platform_engines!() first in order to be able to use things like download_verify() or unpack(). (Pkg does this automatically, which may be why Spencer didn’t hit this error).

You look like you’re on the right track Douglas; just call that probe_platform_engines!() first so that Pkg can look around a bit and figure out what executables to use to do things like extract tar files, and you should be fine.

Great, thanks! I’ll add that to my script.

Thanks for checking this @staticfloat and @ssfrr. This code snippet works for me if I call probe_platform_engines! or if I use julia-nightly.

And thanks for your explanation @ssfrr. I am beginning to see the light.

I have created an Artifacts.toml for the MixedModels package but have been unable to get it to pass travis tests. Apparently it is failing to download a file from https://figshare.com/. See the discussion in Switch to Feather/Artifacts for test data by dmbates · Pull Request #238 · JuliaStats/MixedModels.jl · GitHub

I put the archive on figshare as suggested in the DataDeps package documentation by @oxinabox.

Any suggestions on a cleaner way of accomplishing this?

@dmbates, the Artifact framework doesn’t know how to process the download because the figshare URL doesn’t include a file extension. You’ll either need a more explicit URL or manually process the artifact entry on init . You may find something in GitHub - CiaranOMara/ArtifactHelpers.jl: Bind and initialise reproducible Artifacts that is helpful/useful.

1 Like