I’m working on an ocean model with multiple large dependencies for optional features:
CUDAnative.jl, CuArrays.jl for GPU support.
HDF5.jl and NetCDF.jl if users want output in NetCDF format.
MPI.jl for distributed support (soon).
Plots.jl and PyPlot.jl is used by some of the examples.
Probably more in the future, e.g. JLD2.jl, …
The problem is that I don’t want new users adding the package to have to wait forever for all these heavy dependencies to download and build, only to be greeted by CUDA errors and MPI errors because they don’t have a GPU or an MPI library. Ideally, only the core dependencies are installed when the package is added, and other packages (e.g. CUDA) are only installed and built when they are needed. Users may not need any of the optional dependencies.
So far I’ve been made aware of three solutions:
Specify multiple Project.toml and Manifest.toml files and select the right one for the right application, e.g. use env/gpu/Project.toml for GPU support. This works well for CI pipelines but is a little wonky, and you can’t expect new users to juggle multiple Project.toml files (things are supposed to just work™).
Use Requires.jl and sprinkle @require statements throughout the code. This seems like the best solution but it might clutter functions that are shared between CPU and GPU. It also doesn’t download/install/import the package needed, so users still need to figure out which packages are needed.
Homebrew solution: Have a macro like @import_at_all_costs CUDAnative that imports the package if available, otherwise the package is added, built, and imported. I think this is what we want, although maybe there are reasons why this is a bad idea.
The other unmentioned option is to create additional packages
depending on your base package that provide the additional
functionality. e.g. OceananigansGPU.jl which depends on Oceananigans
and GPU packages, OceananigansNetCDF.jl that depends on Oceananigans,
NetCDF.jl and HDF5.jl, etc. Then users simply add the meta-packages
they’re interested in. The problem with the Requires based solution
is that it can significantly increase package load times (see,
e.g. “time to first plot” discussions). This solution while being
slightly heavier-weight for the package author shouldn’t have the same
package-loading performance problems.
My knowledge on this corner of the Julia eco-system comes from poking
around in profile data from loading Gadfly, which makes use of Requires.
For the specific example of plotting, what you’ll probably want to do
is just depend on RecipesBase which is
fairly lightweight. Doing so will allow your types to be plotted by
the Plots.jl ecosystem without depending on Plots.jl.
Thanks for the suggestion! We kind of thought of meta-packages as a possible solution, and maybe it’s the best approach, but I ended up deciding against it.
I think having OceananigansGPU.jl and OceananigansMPI.jl would be less than ideal as we’d end up repeating so much code, whereas one of the benefits we’re enjoying right now is that the CPU and GPU share the same code (and we’re hoping we can do the same with MPI).
It might make more sense with OceananigansNetCDF.jl and OceananigansPlotting.jl but then I see them as these weird packages that can’t do useful stuff on their own. And it could complicate development at this early stage with changes that affect multiple meta-packages, e.g. can end up with pull requests that depend on each other. Might be a good approach for v1.0+.
Thanks for the link to RecipesBase, looks like it’ll help with plotting for examples!
If you’re currently able to put the GPU code behind Requires.jl while
still sharing code with the CPU, you should be able to do so just as
easily while splitting the GPU-specific code into a separate package.
Why would you have to duplicate any code at all to separate out a GPU package? can’t you use the type system and method dispatch to just override the methods you need to?
Requires has its own issues for both development and use if you rely on it too much, I’m currently considering swapping to separate packages because its cleaner.
I know ocean models are usually enormous single repos but you don’t need to do that in Julia.