Package naming policies for Julia packages within conda-forge

We have a few cases of Python packages depending on Julia packages and vice versa.

I’ve posed the question about what to call Julia packages within conda-forge so that they may be referred to as explicitly dependencies.

My first preference would be to retain the original names. However, other factors may make capitalization or the inclusion of a period, ., challenging.

My second proposal is to use an explicit julia- prefix so that Interpolations.jl may become julia-interpolations.

2 Likes

I’m confused, since when could you use Conda to install Julia packages?

There are two conda-forge packages which are heading in that direction. They currently require secondary steps to invoke the Julia package manager.

  1. GitHub - conda-forge/pysr-feedstock: A conda-smithy repository for pysr.
  2. GitHub - conda-forge/xbitinfo-feedstock: A conda-smithy repository for xbitinfo.

Those two packages are mainly Python interfaces to the following Julia packages, respectively:

  1. SymbolicRegression.jl
  2. BitInformation.jl

Technically, neither conda or mamba are supposed to be language specific.

There are several mechanisms by which conda could “install” a Julia package.

One scenario is where conda-forge in managing a Julia depot within the conda environment. This conda-forge Julia depot would be similar to a /usr/local/share/julia depot. In this scenario, conda would use the Julia package manager at build time to figure out where the Julia package should go within the depot rather than requiring the user to manually invoke the Julia package manager after installation. Insertion of this depot is made possible by depot stacking. An example of this is this conda-forge build script which currently bundles ~100 Julia packages in a depot:
build.sh . The depot is then included into the depot stack via an activate.sh script.

Another scenario might involve conda invoking the julia package manager via one of the user link scripts:
https://docs.conda.io/projects/conda-build/en/latest/resources/link-scripts.html
However, this kind of activity in this scripts is discouraged.

1 Like

sounds extremely cursed, idk what these pkgs do but maybe you can do the same?

PySR is mentioned above but perhaps there is a misunderstanding about the question.

The question is not what should we call a Python wrapping of a Julia package. The question is what should the Julia packages themselves if we could not use the “CamelCase.jl” scheme due to capitalization being an issue or the . being disallowed.

Can you explain a bit about how these packages would be used? Would the Python packages that depend on them do a juliacall type thing to import them?

I feel like CamelCase is part of the identity of Julia packages, so I hope the capitalization turns out to be okay in the end. But if it isn’t, the julia- name seems fine, with the CamelCase becoming snake_case - for ApproxFun.jl becoming julia-approx_fun (assuming underscores are allowed).

The julia- prefix does seem a bit long, especially with names like SplitApplyCombine.jl - it becomes the even longer julia-split_apply_combine. But (slightly) shortening it to split_apply_combine-jl or jl-split_apply_combine doesn’t sound as nice. However, the shortened names do have the benefit that they emphasize the name of the package itself more.

I like that split_apply_combine-jl sounds pretty close to the package’s original name, but I also like the explicitness of julia-split_apply_combine. Knowing some of the context of where and how it will be used will help decide which one sounds more appropriate for that context.

2 Likes

PySR uses pyjulia to invoke Julia from Python. PySR thus have a dependency on SymbolicRegression.jl.

The current procedure involves first installing PySR via pip or conda and then installing SymbolicRegression.jl by invoking this command python -c 'import pysr; pysr.install()'. That command invokes the julia package manager.

What I believe may soon be possible is using the julia package manager during conda-forge packaging so that the user does not need a second installation step. They could just import pysr and start using it.

The mechanism that conda or an operating system package manager could use to supply Julia packages is by populating a depot on Julia’s depot stack. From the docs for Base.DEPOT_PATH, we see there are three depots by default.

  • ~/.julia where ~ is the user home as appropriate on the system;
  • an architecture-specific shared system directory, e.g. /usr/local/share/julia;
  • an architecture-independent shared system directory, e.g. /usr/share/julia.

Within a conda environment this becomes

  • “~/.julia”
  • “$CONDA_PREFIX/local/share/julia”
  • “$CONDA_PREFIX/share/julia”

By manipulating the environment JULIA_DEPOT_PATH, we can add a depot. Because conda-forge does not currently package individual Julia packages, one way to “install” Julia packages via conda would be to ship a Julia depot located in $CONDA_PREFIX/share/pysr/depot.

That depot just needs four directories.

  • artifacts - binary artifacts that the packages may need
  • conda - this just contains a deps.jl configuring Conda.jl
  • environments - preconfigured Julia environments that can loaded via @project_name
  • packages - Julia source code

The easiest way to create this depot is to first set JULIA_DEPOT_PATH to a temporary directory. Alternatively, within Julia one could do:

DEPOT_PATH_BACKUP = copy(DEPOT_PATH)
empty!(DEPOT_PATH)
push!(DEPOT_PATH, tempdir())

From there add the needed packages and create the environments desired. Before exiting Julia, copy the desired folders such as “packages” to the depot location (e.g. $CONDA_PREFIX/share/pysr/depot).

Now we just need to set JULIA_DEPOT_PATH. For example, one could do the following.

export JULIA_DEPOT_PATH="$HOME/.julia:$CONDA_PREFIX/share/pysr/depot"

Within julia one could also do

push!(DEPOT_PATH, joinpath(ENV["CONDA_PREFIX"], "share/pysr/depot")
1 Like

python being python can you not just do this at the python module loading-time if the julia package is not detected?

Thanks for the detailed explanation!


I was originally going to say that this seems like it doesn’t need a whole new DEPOT, just a shared environment would be better - but I saw that you considered that option in this Github thread and changed your mind later, so I’ll assume there’s good reason for going with Depots ultimately.


As for the naming itself, was there ever a conclusion from the #18 regarding a general policy? It seems like this gist is an attempt at a conclusion and a summary, but it’s not clear how official it is.
According to that gist, the convention seems to be language:package, with the language prefix being optional if the package name is unambiguous. Was this ever implemented? What is the current status of R, Node, and Ruby packages (which are the other languages mentioned in these threads) in Conda? Do they have a common standard, ruby-nokogiri or ruby:nokogiri or anything else?

The current design is that each conda environment has its own Julia depot in $CONDA_PREFIX/share/julia
This depot can be modified by the user via the Julia package manager.

The reason I want to install packages into a second depot is that it will be “read-only” and thus we will not have conflicts. There will be a separation between user installed packages and conda installed packages.

It’s definitely possible (I think PythonCall.jl/juliacall does this?), but I think the safest option is to have a manual install step when the user is already connected to the internet.

For example, some clusters have compute nodes which do not have internet access (my local Tiger cluster at Princeton has this problem) - if you submit a PySR job, it would fail - so you need to install the packages before running it.

1 Like

You can simply call juliapkg.resolve() after installing your dependencies. After that point you shouldn’t need an internet connection.

after installing your dependencies

Right this is what I was trying to emphasize - that the user (implicitly or explicitly) needs to run an extra dependency installation step after pip install <package>, such as import juliacall or import pysr; pysr.install(), before running on a machine disconnected from the internet. To answer @jling’s question, the reason I have the explicit install() function rather than just import pysr just so it’s more explicit (so users are aware that further installation will occur). But this is obviously a subjective view on the API!

But what @mkitti and @ngam are working on, there won’t need to be this extra install step after conda install -c conda-forge <package>; it should come directly from conda-forge, which will be cool.

2 Likes

Basically the question we’re asking is if we can run all the Pkg.jl steps to populate the depot, tar up part of the depot, and then untar it for the user so then all the packages are installed without further intervention by the user.

Here’s an example build script:

What happens when you have two conda packages like this and their build scripts choose different/conflicting versions of some Julia packages?

Julia depots can contain multiple versions of a package, so it is not a problem to have different depots with different versions of the same package. In this case the depots are just acting as a package cache, so there is no direct conflict created by the depots themselves.

The real question is what happens when you try to add the packages to a common Julia environment. In this case, Julia will resolve the versions and draw from the depots if the packages exist at the needed versions. If not, then it will just download the needed versions.

I did an experiment by creating independent depots for SymbolicRegression.jl and BitInformation.jl. I then stacked those depots and then created a common environment with a third depot at the top of the depot stack. In that case, no new package versions needed to be downloaded. All the needed package versions were contained in the existing depots.

See the links below.

1 Like