I’m looking for some advice on approaches to this: I have an existing repo containing a Python package. The fact that it’s Python is maybe incidental - the package basically creates some artifacts (which happen to be JSON files) which contain data that I want to have a (read-only) Julia interface to.
The repo includes a Makefile
with a target (generate-artifacts
) that does everything needed from a dead start to generate those artifacts: create a venv, pip install the requirements, and run a script that does some processing and creates the JSON artifacts.
What I’d like is to create a Julia package that allows access to the data in the generated JSON files with all the rest being transparent to the user of said package.
That is, I’d like the user’s workflow to go something like:
] add FooArtifacts # ideally the costly make step happens here and only once
using FooArtifacts
data = FooArtifacts.fetch()
This should do the make
step automatically, refreshing the JSON files (ideally on the add
but second-best would be the using
and at worst of course it would have to be done in fetch()
if not already done) before parsing them and doing whatever steps to provide the data in the desired format.
I’m thinking it should be possible to add a Project.toml
file in the root of the repo, and a src/FooArtifacts.jl
(there’s already a src/
folder but I don’t think that will be a problem), and the ] add
would end up cloning the whole repo into /path/to/packages/FooArtifacts
, from where the code in FooArtifacts.jl
could discover its own location and do the make generate-artifacts
steps.
One problem I foresee is that normally one shouldn’t be changing the contents of the cloned package repo (and perhaps normally one can’t - those should be read-only other than by Pkg, I would think). So copying the package repo somewhere temporary would seem necessary.
Also the make generate-artifacts
step, on first invocation, takes awhile to do all the venv
and pip install
stuff, whereas I’d like that to happen only once (ideally at precompile time) for users of the package. So there’s a cacheing step to perform as well. I’m thinking that Scratch.jl might serve that purpose well.
Does the above approach seem reasonable? Any suggestions for different approaches?