Should we distribute multiple official system images for Julia 1.11?

The added complexity from different downloadables seems much worse to me than a 400 ms slowdown.

7 Likes

Sharing a small comment here about the interaction between Pkg and Distributed.

I am used to distributing parallel jobs with a “preamble” code that uses Pkg to activate the environment of the script on all worker nodes. The preamble assumes that both Pkg and Distributed are readily available in any Julia installation on a cluster of computers:

It would be nice to keep this use case in mind moving forward. Doing distributed computing in Julia is a bit difficult already, and I can imagine that moving Pkg outside the system image can affect this usage somehow. Am I wrong?

This is slightly off-topic, but it would be nice to know how these changes are aligned with the future of Distributed, Dagger, etc.

4 Likes

The only change compared to now would be a slight increase in latency.

Please don’t. Do not forget the large majority of Julia users who want to use Julia for their problems, not to develop Julia itself (and are only rarely active here).
For users, a small latency when adding packages is certainly not a problem whereas adding complexity in the setup of Julia would be.

2 Likes

I don’t see why there would be multiple downloads or even any need for a user to explicitly choose or be aware of different sysimgs. The julia binary already has the ability to load a user-specified sysimg and simply defaults to the standard one. We could ship two different sysimgs—a non-interactive one and an interactive one—and load the former when running a script and the latter when starting an interactive julia session. That said, it would, as Valentin pointed out, prevent upgrading Pkg and REPL without rebuilding the sysimgs.

12 Likes

Yes, I completely understand that, but this is also a source of my concern. Splitting Pkg and REPL out of the system image makes these packages easier to develop. As I have been working on these packages, the workflow is much easier than in the past. They are just normal packages now after all. Does it make Julia easier to use for the large majority of users? My sense here is the primary beneficiaries are advanced users and developers of Julia rather than the majority of users.

The problem is it depends how you use Julia. If you primarily use Julia by starting the REPL, then the changes as committed right now add some additional latency to that startup every single time. For these users, the experience will be less smooth than it is now. We did come up with some mitigations above such as asynchronous loading of Pkg.jl at Julia or REPL startup. This was tremendously useful to me as this prompted me to refactor the new Pkg REPLExt to make it compatible with asynchronous loading.

For Julio, the packages will load as they did before, but each of these jobs will now take an additional 500 ms to load than they did before. I think Julio is a sophisticated user so I could advise him to build a custom system image including Pkg.jl, but I’m not sure I’m going to be able to do the same for all my colleagues. Across many distributed jobs and runs this additional can add up.

Also if the distributed workflow were running on a serverless platform such as AWS Lambda, that can easily add up to a significant cost. At my institution, these kind of delays forced a rewrite of a significant amount of Java code to a language with less latency.

If your primary concern is about the size of the system image, then the modular Pkg.jl and REPL.jl changes will be of great help to you. The system image will start smaller. It will also be more modular than it is now. Pkg.jl will no longer directly depend on the REPL as a strong dependency.

My sense is that many new users to Julia will start with the REPL. I did. Proceeding as it is now, their Julia 1.11 experience will be less smooth than the Julia 1.10 experience. What I would like is the best of both worlds, 1) a fast, responsive REPL and package manager, and 2) a small deployment profile. This is possible, but perhaps not with a single system image A monolithic system image with everything I need will be faster than a modular one. We can hopefully narrow the difference, but there will be a difference. At the end of the day, a modular Julia gives us options, be we need to exercise them.

Recall from the Julia survey that the second highest complaint is TTFX [1]. A half second additional latency is very noticeable in interactive usage. While not terrible, I do not think we should so easily brush off this regression.

As Stephan outlined, having multiple system images is possible and can be completely transparent to the end user. The changed core Julia workflow is as follows.

  1. Are we running interactively? (This step already exists)
  2. If yes, we are running interactively, check if “.julia/sysimages/isys.so” exists at the pre-defined location and load it by default. (This step partially exists due the -J option)
  3. If we are not running interactively or if “isys.so” does not exist, load the normal “sys.so”.

Outside of core Julia, we then need a package or two to manage downloading registered system images as artifacts and either symlinking them into the designated location or copying them there. Maybe it would be better to have a config file, but that is an implementation detail. When someone posts, “Why is Julia so laggy?”, we can respond “] add REPLSystemImage” instead of “Step 1: Build a list of cryptic precompile statements or assemble a comprehensive workflow script…”. Maybe later there will be a SystemImageManager or we can update JuliaManager.jl.

Anyways, I thought it would be good to start thinking about this early in the 1.11 release cycle than later. I’m willing to work on it.

[1] Julia User & Developer Survey 2023

8 Likes

Is there a binary thinking trap, here?

Is it possible for REPL and Pkg to be separate packages, that just happen to be bundled into the sysimages and yet can still be upgraded independently? Or was the goal of extracting them really to produce a smaller sysimage?

No. In this case, to upgrade the packages, one would have to compile a new system image. Although, if the system image were provided by a package, then perhaps the system image could be upgraded by upgrading the package.

Yes, in part, among the other benefits.

2 Likes

I wrote a prototype for system image switching. In the prototype, we attempt to load isys.so rather than sys.so if we are running interactively. If isys.so cannot be loaded, we fallback to use sys.so.

Here’s a demonstration of how this currently works.

Initially, isys.so does not exist, so Julia will load sys.so.

~/src/julia$ ./julia --banner=no
julia> sysimg = Base.JLOptions().image_file |> unsafe_string
"/home/mkitti/src/julia/usr/lib/julia/sys.so"

julia> cp(sysimg, replace(sysimg, "sys.so" => "isys.so"))
"/home/mkitti/src/julia/usr/lib/julia/isys.so"

julia> exit()

We copied sys.so to isys.so. Now isys.so exists, so we should load it in interactive mode.

~/src/julia$ ./julia --banner=no
julia> sysimg = Base.JLOptions().image_file |> unsafe_string
"/home/mkitti/src/julia/usr/lib/julia/isys.so"

If we are not running in interactive mode, then the regular sys.so is loaded despite isys.so existing.

~/src/julia$ ./julia -e "Base.JLOptions().image_file |> unsafe_string |> println"
/home/mkitti/src/julia/usr/lib/julia/sys.so

One complication is Julia’s interactive status is partially determined in Julia after the system image is loaded.

If you run stock julia with no arguments, the Base.JLOptions().isinteractive will be false while isinteractive() will be true.

julia> Base.JLOptions().isinteractive
0

julia> isinteractive()
true

Julia is interactive if the following conditions are met.

  1. The -i or --interactive is provided at the command line OR
  2. No -e or -E commands are provided AND no non-switch arguments are provided AND STDIN is a TTY.

To alleviate this, I wrote a C routine jl_is_interactive

4 Likes

Would Pkg.jl and REPL.jl be pinned by default if users have the corresponding system image? I guess if we get into a “two Julia flavors” stage, it looks like the common source for errors…

I guess we all want a short time-to-first-status. However, it could be educational for a broad base of users if Pkg.jl and REPL.jl are not included per default, but instead, one provides very good tools to quickly add them. The tools to deal with sysimages might gain more wide-spread use over time which might be great for many people who are currently unaware of sysimages.

Thank you Mark for explaining the situation with Distributed jobs and the expected additional delay per process. That is something to consider carefully in the context of parallel cloud services.

If the binary is split into two, it would be great if the official installation instructions had a section dedicated to HPC clusters written for sysadmin people who never heard of Julia.

Back in the days when I was working at IBM I had to request the installation of the language at a HPC cluster and the person responsible for the installation didn’t know where to start. I can imagine that this split will worsen sysadmin lives as well. We need official guidance on how to install Julia in HPC clusters, clouds, … besides the single-user, single-process installation.

I also believe that this change could be bundled into Julia v2.0 to reduce the impact on the user base, which is hopefully increasing with the v1.x series of improvements.

1 Like