I understand that this doesn’t work the way you’re accustomed to, but please do be open to how it works in Julia because it works very well once you get used to it. First, a few observations about Python package installation:
-
Python package installation is highly stateful: installation requires running arbitrary Python code, which can do anything it wants and isn’t necessarily idempotent or repeatable unless the package authors are quite careful to ensure that it is. Installing the same package version at different times or in different environments can produce very different results.
-
In Python even what a specific version of a package depends on can change based on the results of running code. That’s one of the reasons that it’s hard to resolve Python dependencies: you don’t know what a package version even depends on until you’ve tried installing it.
This means that if you want to keep a project working, it really needs its own local, unshared set of dependencies that are siloed from all other projects. By comparison, here are some features of how Julia packages management works:
-
Julia package installation is stateless and reproducible. Installing a package doesn’t involve running any package code: a tarball of the source tree of a package version is unpacked in the right path in $depot/packages
and that’s it. The file tree is installed read-only and never modified by the package manager, the package or anything else.
-
Moreover, Julia package versions are identified by their git SHA1 tree hash, so they’re inherently verifiable: if you compute the tree hash and it doesn’t match, you have the wrong package source.
-
(There is a legacy exception to the rule that no package code is run during installation: if a package contains a file called deps/build.jl
it will be run after installation to allow the package to build things, for example, downloading some data or compiling some source code. This is, however, only allowed to modify the contents of the deps
directory. Also, using deps/build.jl
is no longer recommended: there are better—stateless and reproducible—mechanisms for doing this kind of thing these days, as described in the next three points.)
-
If a package needs to download and use immutable data, it should use artifacts, which are automatically downloaded and installed when a package declares that they are needed by including an Artifacts.toml
file. Like packages, they are immutable and content addressed, identified and loaded by git tree hash, making them inherently verifiable and perfectly cacheable and reproducible. Since artifacts are immutable, it is safe to share them between multiple package versions and even between different packages. This can save considerable space since artifacts can be large. Artifacts are automatically cleaned up by pkg> gc
when there are no more package versions that depend on them.
-
The artifact mechanism is also used to install binary dependencies such as pre-built C and Fortran libraries, a large body of which can be found in Yggdrasil. This means that not only is Julia package installation stateless and reproducible, but binary dependency installation is as well. This makes it very reliable to install and setup a Julia project, from the source code to the binary dependencies that it uses.
-
If a package needs to work with mutable data, rather than using package directory for this, it should use scratch spaces, which let packages setup transient scratch directories in which they can download/generate whatever they want and have it persist across package usages. Moreover, it can be shared by different versions of the same package, instead of each version generating its own copy, and scratch spaces are automatically cleaned up by pkg> gc
when there are no more versions of the package that need it.
-
Julia packages and artifacts are served to Julia clients by a global network of package servers which can be reached at https://pkg.julialang.org
. They are served by URLs that look like
https://pkg.julialang.org/package/$uuid/$hash
https://pkg.julialang.org/artifact/$hash
When you download one of these you get a gzip-compressed tarball of the package version or artifact tree. For example, you can download and list the contents of version 1.2.0 of the BSDiff package like this:
julia> using Tar
julia> Tar.list(pipeline(`curl -fLsS https://pkg.julialang.org/package/7b188ff4-8bb6-4dee-bbe1-9b6fdde2c7c5/70d0d8a17dcd4dbf44c29e849550bc6bf53c6ec8`, `gzcat`))
11-element Vector{Tar.Header}:
Tar.Header(".github/workflows/TagBot.yml", :file, 0o644, 204, "")
Tar.Header(".gitignore", :file, 0o644, 31, "")
Tar.Header(".travis.yml", :file, 0o644, 121, "")
Tar.Header("Artifacts.toml", :file, 0o644, 281, "")
Tar.Header("LICENSE", :file, 0o644, 2531, "")
Tar.Header("Project.toml", :file, 0o644, 909, "")
Tar.Header("README.md", :file, 0o644, 5577, "")
Tar.Header("src/BSDiff.jl", :file, 0o644, 12396, "")
Tar.Header("src/classic.jl", :file, 0o644, 3394, "")
Tar.Header("src/endsley.jl", :file, 0o644, 2405, "")
Tar.Header("test/runtests.jl", :file, 0o644, 7930, "")
Here 7b188ff4-8bb6-4dee-bbe1-9b6fdde2c7c5
is the UUID of the package and 70d0d8a17dcd4dbf44c29e849550bc6bf53c6ec8
is the tree hash of the 1.2.0 version recorded in the General registry. Similarly, you can download the test_data
artifact that BSDiff uses for testing like this:
julia> Tar.list(pipeline(`curl -fLsS https://pkg.julialang.org/artifact/d2ca0cfa36769774a442b467b353dcf908186384`, `gzcat`))
6-element Vector{Tar.Header}:
Tar.Header(".gitignore", :file, 0o644, 39, "")
Tar.Header("LICENSE", :file, 0o644, 1100, "")
Tar.Header("registry/after.tar", :file, 0o644, 13685760, "")
Tar.Header("registry/before.tar", :file, 0o644, 13349376, "")
Tar.Header("registry/classic.diff", :file, 0o644, 13792104, "")
Tar.Header("registry/reference.diff", :file, 0o644, 13792152, "")
-
Because of immutability and content addressing, these tarball URLs are perfectly cacheable: cache invalidation is never required since the content of a package version or artifact cannot ever change. Moreover, the package server system stores them permanently in multiple storage locations, including S3 buckets belonging to the JuliaLang AWS account, which means that if you’ve installed something via package servers in the past, you’ll be able to do so in the future as well (forever).
All of this means that Julia package management is very different from Python package management. In Python, if you want any hope of reproducibility or even keeping a project working in the future, you must have a local set of packages for it that are siloed from the dependencies of any other projects. Otherwise there’s a very real risk that a package operation in one project will trash an unrelated project. If you delete or modify the packages that a project depends on, you may not be able to get them back into the same (working) state they were in when it was working.
In Julia, on the other hand, your installed packages and artifacts are essentially just a cache. By design, all the information you need to reconstitute everything a project depends on is recorded in its Project.toml
and Manifest.toml
files. You can delete all of the packages and artifacts that a project depends on and just do pkg> instantiate
and it will download and install everything that’s needed, and since the package servers remember anything you’ve installed forever, you can do this at any point. This works so reliably that people regularly just delete their ~/.julia
depots and reconstitute them from scratch.
So having local, siloed sets of packages and artifacts for each project in Julia just isn’t necessary or useful the way it is in Python. You can, nevertheless have a local depot path if you want to by setting the JULIA_DEPOT_PATH
variable to the project directory like this:
$ mkdir MyProject
$ cd MyProject
$ export JULIA_DEPOT_PATH=$(pwd)
$ julia -q --project=.
(MyProject) pkg> add BSDiff
Installing known registries into `~/tmp/MyProject`
Updating registry at `~/tmp/MyProject/registries/General.toml`
Resolving package versions...
Installed Preferences ──────── v1.2.3
Installed CodecBzip2 ───────── v0.7.2
Installed Bzip2_jll ────────── v1.0.8+0
Installed SuffixArrays ─────── v0.3.0
Installed BSDiff ───────────── v1.2.0
Installed BufferedStreams ──── v1.0.0
Installed Compat ───────────── v3.41.0
Installed TranscodingStreams ─ v0.9.6
Installed JLLWrappers ──────── v1.3.0
Downloaded artifact: Bzip2
Updating `~/tmp/MyProject/Project.toml`
[7b188ff4] + BSDiff v1.2.0
Updating `~/tmp/MyProject/Manifest.toml`
[7b188ff4] + BSDiff v1.2.0
[e1450e63] + BufferedStreams v1.0.0
[523fee87] + CodecBzip2 v0.7.2
[34da2185] + Compat v3.41.0
[692b3bcd] + JLLWrappers v1.3.0
[21216c6a] + Preferences v1.2.3
[24f65c1e] + SuffixArrays v0.3.0
[3bb67fe8] + TranscodingStreams v0.9.6
[6e34b625] + Bzip2_jll v1.0.8+0
[0dad84c5] + ArgTools v1.1.1
[56f22d72] + Artifacts
[2a0f44e3] + Base64
[ade2ca70] + Dates
[8bb1440f] + DelimitedFiles
[8ba89e20] + Distributed
[f43a241f] + Downloads v1.5.1
[7b1f6079] + FileWatching
[b77e0a4c] + InteractiveUtils
[b27032c2] + LibCURL v0.6.3
[76f85450] + LibGit2
[8f399da3] + Libdl
[37e2e46d] + LinearAlgebra
[56ddb016] + Logging
[d6f4376e] + Markdown
[a63ad114] + Mmap
[ca575930] + NetworkOptions v1.2.0
[44cfe95a] + Pkg v1.8.0
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA v0.7.0
[9e88b42a] + Serialization
[1a1011a3] + SharedArrays
[6462fe0b] + Sockets
[2f01184e] + SparseArrays
[10745b16] + Statistics
[fa267f1f] + TOML v1.0.0
[a4e569a6] + Tar v1.10.0
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
[e66e0078] + CompilerSupportLibraries_jll v0.5.0+0
[deac9b47] + LibCURL_jll v7.73.0+4
[29816b5a] + LibSSH2_jll v1.9.1+2
[c8ffd9c3] + MbedTLS_jll v2.24.0+2
[14a3606d] + MozillaCACerts_jll v2020.7.22
[4536629a] + OpenBLAS_jll v0.3.17+2
[83775a58] + Zlib_jll v1.2.12+1
[8e850b90] + libblastrampoline_jll v3.1.0+0
[8e850ede] + nghttp2_jll v1.41.0+1
[3f19e933] + p7zip_jll v16.2.1+1
Precompiling project...
15 dependencies successfully precompiled in 3 seconds
$ ls -l
total 12
-rw-r--r-- 1 stefan staff 6458 Jan 1 16:15 Manifest.toml
-rw-r--r-- 1 stefan staff 55 Jan 1 16:15 Project.toml
drwxr-xr-x 3 stefan staff 96 Jan 1 16:15 artifacts
drwxr-xr-x 3 stefan staff 96 Jan 1 16:15 compiled
drwxr-xr-x 5 stefan staff 160 Jan 1 16:15 logs
drwxr-xr-x 11 stefan staff 352 Jan 1 16:15 packages
drwxr-xr-x 4 stefan staff 128 Jan 1 16:14 registries
drwxr-xr-x 3 stefan staff 96 Jan 1 16:15 scratchspaces
As you can see, this installs project-local copies of packages, artifacts, etc. You might want to symlink the registries directory to ~/.julia/registries
so that it’s shared. Same with logs
. But then again, why not just share all of the directories since there’s no danger in sharing packages or artifacts since they’re immutable?
Regarding the concern that keeping packages and artifacts in a shared location will cause bloat, I’ve already explained how pkg> gc
works—it prevents exactly the bloat you’re worried about. In fact, having a single place where packages and artifacts live reduces bloat since a single version can be shared by as many projects as need them. To clean up unused packages or artifacts, just do pkg> gc --all
and they’ll be deleted. Or do nothing and let Julia do it automatically when you do package operations. As @jzr has mentioned, it would be good to have a Pkg command to forget about a manifest so that you don’t have to delete it before doing gc
in order to clean up its dependencies, but you can actually already do this manually quite easily: manifests that you’ve used are recorded in ~/.julia/logs/manifest_usage.toml
which looks like this:
[["/Users/stefan/.julia/environments/v1.6/Manifest.toml"]]
time = 2021-10-11T17:31:25.178Z
[["/Users/stefan/.julia/environments/v1.7/Manifest.toml"]]
time = 2021-09-09T16:45:36.021Z
[["/Users/stefan/.julia/environments/v1.8/Manifest.toml"]]
time = 2021-12-13T11:45:42.971Z
[["/Users/stefan/dev/ArgTools/Manifest.toml"]]
time = 2021-09-09T16:45:36.122Z
[["/Users/stefan/dev/BSDiff/Manifest.toml"]]
time = 2021-09-09T16:45:36.042Z
[["/Users/stefan/dev/Downloads/Manifest.toml"]]
time = 2022-01-01T16:42:18.669Z
[["/Users/stefan/dev/NetworkOptions/Manifest.toml"]]
time = 2021-09-09T16:45:36.049Z
[["/Users/stefan/dev/Pkg/Manifest.toml"]]
time = 2021-10-12T20:48:13.562Z
[["/Users/stefan/dev/Tar/Manifest.toml"]]
time = 2021-11-02T17:21:43.901Z
[["/Users/stefan/dev/julialang.org/Manifest.toml"]]
time = 2021-11-29T11:03:09.906Z
You can open this in an editor, delete any entries you don’t care about keeping the dependencies installed for anymore, and then do gc
again to clean up. It might be good to have a command to remove a manifest from the usage log, but editing the text file is already pretty easy—I do it periodically to purge dependencies of projects I don’t need to keep around anymore.
It would definitely be a nice feature to be able to install all of a projects dependencies locally to the project, but not for the sake of reproducibility or disk space (since it doesn’t help either of those in Julia), but for the sake of being able to ship someone a self-contained application bundle that can be used without needing to download anything else. However, if that’s what you want, PackageCompiler does this for you and also compiles the application into a custom binary, which is why this functionality hasn’t been pressing to develop in Julia itself.