Julia / Conda / Linux

Hello all,

I’m curious what Julia/Linux users have found the best approach to dividing package responsibilities between their distro and Conda.

Conda is pretty great in some ways, but it can be painfully slow, and can be a little weird about downgrading packages with no explanation.

If I were just using Julia, I would probably have my distro (“Arch-like” Manjaro) do what it can, then Pkg, and (mini)conda for anything missing. In this case I’d see no reason to let Conda near Jupyter, for example. But I do need to use Python a lot for work, and Conda doesn’t seem too great about finding things that are already installed. And there are a few Julia packages that seem to need (or at least strongly encourage) using Conda.

I’d love to find a strategy for what-to-install-from-where that’s relatively simple, without excessive Conda bloat. What have you found to work well?

My preference is to not use Conda at all, I have a local installation of python available on arch linux, so no Conda is needed for me.

ENV["PYTHON"] = "python"

When found, Conda is bypassed.

Also, part of my reasons for creating my package Reduce.jl was to stop relying on Conda for symbolic computations on CI tests.

6 Likes

Is that documented somewhere?

Yep, I feel your pain (I use macOS and Arch Linux). I am always scared when installing some Julia packages and see Conda in the dependency list. At that point I feel like I totally lost control of the dependencies and have no idea what’s installed and used behind the scenes. Especially the Python stuff like jupyter/pycall/pyplot.

However, I confirm that setting ENV["PYTHON"] = "python" usually works. I however found myself in situations where this was pretty sure not the case. No idea about other stuff though…

2 Likes

Me too. When I see a Conda dependency I either do not install the package or follow it with extra attention. Adding extra 2 or 3 Gigs is no joy.
(Also have ENV["PYTHON"] = ...)

2 Likes

I feel the pain, too. There is a PR update conda executable location for Linux/macOS by Quar · Pull Request #146 · JuliaPy/Conda.jl · GitHub for adding more control for the conda environment and conda executable you use for Conda.jl. I also created a RFC PR RFC: Add an API to safely add conda packages by tkf · Pull Request #613 · JuliaPy/PyCall.jl · GitHub to make installation of Python packages via Conda.jl more “static” and predictable. I hope those changes would reduce some parts of the pain.

I create a virtual environment using virtualenv dedicated to PyCall.jl (venv should work too). This way, installing Python packages using pip is quite easy and safe. You can also use different Python virtual environments for different project: GitHub - JuliaPy/PyCall.jl: Package to call Python functions from the Julia language. This is handy if you want to control Python packages and their versions for each project.

3 Likes

This is the default on Linux, so I don’t see why you would have needed this.

See also the PyCall installation documentation. PyCall can work with any Python installation on your system. On Mac and Windows, it defaults to using Conda.jl to install its own Python distro. If you set ENV["PYTHON"] when building PyCall it will use the Python of your choice — you only have to do this once, and it will remember.

Many (not all) of the packages that use Conda are doing so in order to install Python packages for use with PyCall. So if you configure PyCall to use a different python, then dependent packages will also not use Conda.

Conda.jl doesn’t use anything that is already installed — the whole point is for it to install a self-contained Python distribution. (Trying to half use its own Python packages and half use some installed independently somewhere is a recipe for continual breakage.) If you are comfortable managing your own Python installation and handling installation of packages, you can tell PyCall to use that instead of Conda.

5 Likes

Could it be that you did not have all of the required python dependencies locally?

It definitely wasn’t always the default, they improved the defaults then.

It’s interesting that people here are using the system python. For really mature python packages which are distributed with the package manager I suspect that would work well. Conversely, I’ve often found the need to install packages which aren’t provided by the system. This is especially true when working on HPC systems where the system python can be ancient and you don’t have root access. Installing packages on the system on an ad-hoc basis with pip works there for the short term but explodes when you upgrade your system. (Admittedly, probably user error on my part.)

So at the moment I mostly manage my conda environments manually and set ENV["PYTHON"] to point to them as necessary. As for the extra disk space? It’s an annoyance but seems worth it for having an at least somewhat reproducible and portable experience between different systems.

2 Likes

I actually never use the system Python, but pyenv. Also on every single HPC system I work, I have my own Python installation (via pyenv). Project specific environments are easily doe with the venv module, which is part of later Python 3 versions.

I would assume that when you set ENV["PYTHON"] = "python" it always finds the one where your PATH is pointing to, but as said, I saw this being ignored and another Python version taken. So I don’t know what’s happening behind the scenes.

When I do pyenv shell 3.7.2 e.g. (or activateing virtualenv, using conda etc.) and then launch the julia REPL, I would like to have the very same Python version in any Py-related package. Without the need of rebuilding or reinstalling stuff. I think this would be the most satisfying solution.

Another problem regarding Jupyter for example is, that if you do pip install jupyter, you will also get command line tools installed to your system. This can shadow your jupyter installation from within a Python environment.

Let’s say you installed jupyter in your system Python (this is the root of many Jupyter problems) and activate a virtualenv without Jupyter installed in, a simply check like which jupyter may yield a valid path (from the system installation), and if this instance is spawned, Jupyter will be launched with it’s own (partly staticly linked) libs. Suddenly you end up with a Python session where imports gets mixed up. I had this several times in past and it is also true for the commonly used IPython REPL. Hardcoded paths and static links are the problem there.

Therefore my advice: never install anything in your system Python :wink:

2 Likes

By “ad-hoc basis”, do you mean sudo pip install ...? I’d never use it unless it’s inside some kind of a container. If you mean using pip with --user or in virtual environments (even better), I’d say it’s pretty “safe” in the sense you can wipe off the environment which becomes unusable.

But I understand that you may describe it “explodes when you upgrade your system” because virtual environments becomes unusable once you remove/upgrade the Python used for the environments. This is why using pyenv is a safer option if you can’t control when the system Python is updated.

Yeah, I use pyenv for remote machines, too. Sometimes that’s the only choice.

Note that ENV["PYTHON"] is a build-time option. Run-time value of ENV["PATH"] is completely ignored (that’s the design decision of PyCall.jl which is required to improve startup time). This is why you can’t use different Python executable by just re-launch Julia REPL with different ENV["PATH"]; you have to re-build PyCall.jl.

At the moment, the only way to switch Python environment at run-time is PYCALL_JL_RUNTIME_PYTHON: GitHub - JuliaPy/PyCall.jl: Package to call Python functions from the Julia language (as I linked above). However, note the caveat:

Python virtual environments created by venv and virtualenv can be used from PyCall, provided that the Python executable used in the virtual environment is linked against the same libpython used by PyCall. Note that virtual environments created by conda are not supported.

Yes, I agree that this would be pretty handy. Unfortunately, this is very hard to implement and I don’t think it is a viable option. The closest solution would be Package options · Issue #38 · JuliaLang/Juleps · GitHub which requires some improvements in Pkg.jl.

1 Like