I’m curious what Julia/Linux users have found the best approach to dividing package responsibilities between their distro and Conda.
Conda is pretty great in some ways, but it can be painfully slow, and can be a little weird about downgrading packages with no explanation.
If I were just using Julia, I would probably have my distro (“Arch-like” Manjaro) do what it can, then Pkg, and (mini)conda for anything missing. In this case I’d see no reason to let Conda near Jupyter, for example. But I do need to use Python a lot for work, and Conda doesn’t seem too great about finding things that are already installed. And there are a few Julia packages that seem to need (or at least strongly encourage) using Conda.
I’d love to find a strategy for what-to-install-from-where that’s relatively simple, without excessive Conda bloat. What have you found to work well?
Yep, I feel your pain (I use macOS and Arch Linux). I am always scared when installing some Julia packages and see Conda in the dependency list. At that point I feel like I totally lost control of the dependencies and have no idea what’s installed and used behind the scenes. Especially the Python stuff like jupyter/pycall/pyplot.
However, I confirm that setting ENV["PYTHON"] = "python" usually works. I however found myself in situations where this was pretty sure not the case. No idea about other stuff though…
I create a virtual environment using virtualenv dedicated to PyCall.jl (venv should work too). This way, installing Python packages using pip is quite easy and safe. You can also use different Python virtual environments for different project: https://github.com/JuliaPy/PyCall.jl#python-virtual-environments. This is handy if you want to control Python packages and their versions for each project.
This is the default on Linux, so I don’t see why you would have needed this.
See also the PyCall installation documentation. PyCall can work with any Python installation on your system. On Mac and Windows, it defaults to using Conda.jl to install its own Python distro. If you set ENV["PYTHON"] when building PyCall it will use the Python of your choice — you only have to do this once, and it will remember.
Many (not all) of the packages that use Conda are doing so in order to install Python packages for use with PyCall. So if you configure PyCall to use a different python, then dependent packages will also not use Conda.
Conda.jl doesn’t use anything that is already installed — the whole point is for it to install a self-contained Python distribution. (Trying to half use its own Python packages and half use some installed independently somewhere is a recipe for continual breakage.) If you are comfortable managing your own Python installation and handling installation of packages, you can tell PyCall to use that instead of Conda.
It’s interesting that people here are using the system python. For really mature python packages which are distributed with the package manager I suspect that would work well. Conversely, I’ve often found the need to install packages which aren’t provided by the system. This is especially true when working on HPC systems where the system python can be ancient and you don’t have root access. Installing packages on the system on an ad-hoc basis with pip works there for the short term but explodes when you upgrade your system. (Admittedly, probably user error on my part.)
So at the moment I mostly manage my conda environments manually and set ENV["PYTHON"] to point to them as necessary. As for the extra disk space? It’s an annoyance but seems worth it for having an at least somewhat reproducible and portable experience between different systems.
I actually never use the system Python, but pyenv. Also on every single HPC system I work, I have my own Python installation (via pyenv). Project specific environments are easily doe with the venv module, which is part of later Python 3 versions.
I would assume that when you set ENV["PYTHON"] = "python" it always finds the one where your PATH is pointing to, but as said, I saw this being ignored and another Python version taken. So I don’t know what’s happening behind the scenes.
When I do pyenv shell 3.7.2 e.g. (or activateing virtualenv, using conda etc.) and then launch the julia REPL, I would like to have the very same Python version in any Py-related package. Without the need of rebuilding or reinstalling stuff. I think this would be the most satisfying solution.
Another problem regarding Jupyter for example is, that if you do pip install jupyter, you will also get command line tools installed to your system. This can shadow your jupyter installation from within a Python environment.
Let’s say you installed jupyter in your system Python (this is the root of many Jupyter problems) and activate a virtualenv without Jupyter installed in, a simply check like which jupyter may yield a valid path (from the system installation), and if this instance is spawned, Jupyter will be launched with it’s own (partly staticly linked) libs. Suddenly you end up with a Python session where imports gets mixed up. I had this several times in past and it is also true for the commonly used IPython REPL. Hardcoded paths and static links are the problem there.
Therefore my advice: never install anything in your system Python
By “ad-hoc basis”, do you mean sudo pip install ...? I’d never use it unless it’s inside some kind of a container. If you mean using pip with --user or in virtual environments (even better), I’d say it’s pretty “safe” in the sense you can wipe off the environment which becomes unusable.
But I understand that you may describe it “explodes when you upgrade your system” because virtual environments becomes unusable once you remove/upgrade the Python used for the environments. This is why using pyenv is a safer option if you can’t control when the system Python is updated.
Yeah, I use pyenv for remote machines, too. Sometimes that’s the only choice.
Note that ENV["PYTHON"] is a build-time option. Run-time value of ENV["PATH"] is completely ignored (that’s the design decision of PyCall.jl which is required to improve startup time). This is why you can’t use different Python executable by just re-launch Julia REPL with different ENV["PATH"]; you have to re-build PyCall.jl.
Python virtual environments created by venv and virtualenv can be used from PyCall, provided that the Python executable used in the virtual environment is linked against the same libpython used by PyCall. Note that virtual environments created by conda are not supported.
Yes, I agree that this would be pretty handy. Unfortunately, this is very hard to implement and I don’t think it is a viable option. The closest solution would be https://github.com/JuliaLang/Juleps/issues/38 which requires some improvements in Pkg.jl.