I used to install python dependencies with run(`$(PyCall.python) $(proxy_arg) -m pip install --user $(PACKAGES)`)
in a build script. Does it make sense to install them instead with artifacts? If yes, does somebody have an example?
Hey! I actually did some experiments on this a little while back. My repository PyEnv.jl
attempts to bring the advantages of Julia Manifest files and Artifacts to python packages. Quick TL;DR for those that don’t read the whole README: This package IS NOT good for production usage yet; it’s an experiment and it showcases weaknesses in both its own implementation as well as difficulties with working in the Python ecosystem.
The package essentially uses a Python_jll
artifact to get a python interpreter, uses pip
to download/install python packages to a local environment, then you can “artifactify” that local environment, which freezes the python source tree into artifacts that get baked into an Artifacts.toml
file. Those artifacts could then be uploaded somewhere to provide a nice, portable and completely reproducible python environment. The with_artifacts_pyenv()
wrapper function will set up all the environment variables and whatnot needed for python
to be able to find its packages, and everything should be happy. In theory.
In practice, there are still significant barriers to this. For instance, python packages, in general, must run arbitrary python code at install-time, up to and including resolving what other packages to install. This means that, without being able to run the python code during install, we may not know exactly what needs to be installed. I spent a good chunk of time trying to install Python packages straight from the metadata in PyPI (and not using pip
at all; see the direct.jl
file in the repo) but in the end decided it was too much work, and there were not enough python packages using solely .whl
files (which give a much nicer interface for this than the competing formats).
A consequence of this is that it makes the python environments completely platform-specific, and it’s difficult to generate python environments for foreign architectures, because we’re not running on those foreign architectures. This is solvable by generating a separate pyenv on each platform you want, but this is a substandard experience for us, where we’re used to being able to create a manifest on one machine, then instantiate it on another, and as long as that other machine is running a supported platform, everything should “just work”.
My opinion on this is that there is promise, but we need to wait for the Python packaging ecosystem to settle more fully on wheels and a fully declarative interface for installing packages. If the entire subtree of dependencies that you require have wheels for all major platforms, the strategy in the direct.jl
code will allow us to generate cross-platform lists of archives that must be downloaded and unpacked, without needing to run python code at install time (which would then lock us to installing only for the current platform). Once we can do that, we will be able to provide a Julia-like experience for downloading and unpacking Python packages, and more than that, recording them in an Artifacts.toml (such as this one which gets generated only for the current platform; note how there’s only platform-specific entries for the single platform it was built for) so that things are nicely reproducible.
Thanks a lot for the detailed answer!