I am looking for a best practice solution to the following situation.
I am developing a data processing pipeline for my team. All of the core functionality I have wrapped up in a standard julia module (file I/O with our custom binaries, type definitions, functions for each step of processing, methods for downloading external resources, etc). The “pipeline” itself is composed of a number of high-level scripts that collect command-line arguments and internally call the library functions. Some of these scripts also conduct diagnostic tasks (GUI planning tools, data-quality plotting, etc). All of these scripts live in a bin/
directory within my module’s top-level (same level as src/
, deps/
, test/
, and docs/
).
Most of my colleagues don’t have any experience with Julia. I would like to minimize the amount of steps necessary for my team members to get the pipeline up and running on their own machines (as well as the least amount of support needed on my end). However, since this pipeline is still in active development, I do not want to have to rebuild binaries to redistribute each time there is a change.
Currently, this is the current instructions I have listed for my team:
- Download and install Julia (1.8 or greater). Add it to user path.
- Git clone our (private) repository
- Launch Julia REPL. Enter package manager mode and
dev path/to/cloned/repo
. (I am usingdev
instead ofadd
since I could not figure out an easy way to get Pkg to remember our Gitlab ssh key, and having to enter this in each time to update the module is too much of a pain) - Manually install (
add
) a list of all julia script dependencies - Add the
bin/
folder to user path so that all of the scripts are immediately available for use
I then let my team know when they should git pull once new features of interest to them are available. The biggest issue with this approach is that occasionally I add a new dependencies to scripts, and then have to invariably deal with individual emails fixing people’s environments. I would really like to be able to maintain a Project.toml file (or something equivalent) instead within the bin folder which the scripts could automatically activate on their own. However, I cannot seem to figure out a clean way to do this. I have been trying to use the shebang in the script with something along the lines of
#! /usr/bin/env -S julia --project=@. -e "using Pkg; Pkg.instantiate()"
but this has the two-fold problem that (1) the user needs to have the module’s bin/
directory be their active working directory (which defeats the purpose of adding the pipeline to their path and being able to run it wherever actual data files for processing live, and I can’t hard-code a path instead because I have no way of knowing a priori where a user will git clone the library) and (2) that since my module is not registered in the Julia registry, this invocation leads to an error anyway.
Is there another recommended approach I should be taking instead? I see another seemingly related thread here, but I do not want to have to have my teammates activate
each time they run the scripts. Ideally, they should “just work”.
Thanks!