Workflow for using package artifacts

I’m working on library for creating counterfactual explanations. For the docs I want to make some pretrained Flux.jl models and data available as artifacts. I’m puzzled as to where I’m actually supposed to host the data.

The approach I’ve gone for (clearly not ideal) is to host the data in a separate GitHub repo. Then in my actual package repo I created the Artifact.toml as follows:

using Pkg.Artifacts

# This is the path to the Artifacts.toml we will manipulate
artifact_toml = joinpath(@__DIR__, "../..", "Artifacts.toml")

data_repo = "https://github.com/pat-alt/AlgorithmicRecourse_data/raw/main"

function generate_artifact(name; data_repo=data_repo, artifact_toml=artifact_toml)

    hash = artifact_hash(name, artifact_toml)

    # If the name was not bound, or the hash it was bound to does not exist, create it!
    if isnothing(hash) || !artifact_exists(hash)

        # We create the artifact by simply downloading a few files into the new artifact directory
        url_base = joinpath(data_repo,name)

        # create_artifact() returns the content-hash of the artifact directory once we're finished creating it
        hash = create_artifact() do artifact_dir
            download("$(url_base)/data.bson", joinpath(artifact_dir, "data.bson"))
            download("$(url_base)/model.bson", joinpath(artifact_dir, "model.bson"))
        end

        # Now bind that hash within our `Artifacts.toml`.  `force = true` means that if it already exists,
        # just overwrite with the new content-hash.  Unless the source files change, we do not expect
        # the content hash to change, so this should not cause unnecessary version control churn.
        bind_artifact!(artifact_toml, name, hash; lazy=true)
    end
end

generate_artifact("UCR")

In order for users to be able to actually access the data I have added a helper function to the package:

module Data

using Pkg.Artifacts
using Flux
using BSON: @load

function ucr_data()

    data_dir = artifact"UCR"
    
    @load joinpath(data_dir,"data.bson") data
    @load joinpath(data_dir,"model.bson") model

    return data, model

end

end

That seems to be working, at least on my machine, but it seems somewhat convoluted and I’m not really making use of the artifact_archive functionality. I’m also not sure if the the artifacts will be downloaded without issues on any OS.

Having looked at this tutorial and this example that it points to, it seems there is a way to somehow generate artifacts from within the package repo and upload them as tarballs to become available under remote/releases/download. I haven’t worked with releases yet, so right now this adds another layer of complexity.

Grateful if anyone could point me in the right direction :slight_smile:

You may want to look at ArtifactUtils.jl, it’s much better than dealing with the low-level API provided by Pkg.

1 Like