Wrapping make-generated artifacts in Julia package

I’m looking for some advice on approaches to this: I have an existing repo containing a Python package. The fact that it’s Python is maybe incidental - the package basically creates some artifacts (which happen to be JSON files) which contain data that I want to have a (read-only) Julia interface to.

The repo includes a Makefile with a target (generate-artifacts) that does everything needed from a dead start to generate those artifacts: create a venv, pip install the requirements, and run a script that does some processing and creates the JSON artifacts.

What I’d like is to create a Julia package that allows access to the data in the generated JSON files with all the rest being transparent to the user of said package.

That is, I’d like the user’s workflow to go something like:

] add FooArtifacts # ideally the costly make step happens here and only once
using FooArtifacts
data = FooArtifacts.fetch()

This should do the make step automatically, refreshing the JSON files (ideally on the add but second-best would be the using and at worst of course it would have to be done in fetch() if not already done) before parsing them and doing whatever steps to provide the data in the desired format.

I’m thinking it should be possible to add a Project.toml file in the root of the repo, and a src/FooArtifacts.jl (there’s already a src/ folder but I don’t think that will be a problem), and the ] add would end up cloning the whole repo into /path/to/packages/FooArtifacts, from where the code in FooArtifacts.jl could discover its own location and do the make generate-artifacts steps.

One problem I foresee is that normally one shouldn’t be changing the contents of the cloned package repo (and perhaps normally one can’t - those should be read-only other than by Pkg, I would think). So copying the package repo somewhere temporary would seem necessary.

Also the make generate-artifacts step, on first invocation, takes awhile to do all the venv and pip install stuff, whereas I’d like that to happen only once (ideally at precompile time) for users of the package. So there’s a cacheing step to perform as well. I’m thinking that Scratch.jl might serve that purpose well.

Does the above approach seem reasonable? Any suggestions for different approaches?

I believe you can try Artifacts.jl stdlib or BinaryBuilder.jl to create artifacts that are easily loaded on different platforms. I have experience with BinaryBuilder.jl, and it allows the consumption of scripts such as Makefiles to create products.

Another alternative is DataDeps.jl, which I also used in the past for similar purposes.