Would it be possible to build the logic for running PackageCompiler.jl into Juliaup?
Like when you first switch to a new version of Julia, you’re asked if you’d like to compile all default packages to make Julia snappier?
I agree with some sentiments here that repl startup time isn’t the biggest deal breaker for me personally, but as more and more libraries are decoupled, The payoff for building a system to automate package compiler would be larger.
I suggest one change at a time (REPL and Pkg leave system image) and
track what the feedback is after the release (like we always do).
Introducing extra complexity when there is no evidence how any emerging
problem would show, how users would respond, and how best to address
the problem seems to asking for trouble.  Besides, isn’t resolving issues that
come up exactly what the minor releases are for?  
Personally, keeping everything the same and then seeing if I (as a representative
user) even notice.
Just compared startup times between 1.10rc2 and 1.11.0-DEV.1065. I can’t tell the difference.
M2 MacBook Air. Latest OS.
Why and when? I support the smaller sysimage (why I was working on it), and quicker startup in non-interactive mode.
Those mode were quick since built into the sysimage, but if those packages precompiled, then they should be as quick when loaded additionally at startup (or later).
I’m not sure what’s the issue is, them not precompiled, so slower on first use (could they be distributed as precompiled?)? I would be ok with even not precompiled if that’s handled on first use. The latency of loading (Pkg) should seemingly be there when starting the REPL, but not later when invoking the Pkg prompt, that might be annoying.
You could bundle an alternative sysimage, bloating the download, or since this is for Pkg that needs internet access anyway, letting it download it. Could it switch to it, maybe only for interactive?
Could that sysimage even be part of VS Code Julia extension?
Thanks for working on this.
I see this, but not the old trick of lowering optimization, would that help (maybe even all down to compile=min/interpreter used ok?), or is it outdated, redundant with precompilation:
That’s a misunderstanding, you can still use Pkg and REPL with the smaller sysimage, just with a bit higher latency. The alternative sysimage more like the current one.
One way is to use the sysimage from older 1.10, I’m not even sure that’s possible (I think not, could it be supported?), but if not then by using julia +1.10 for development. I don’t see it a huge deal for interactive. Getting 1.11 faster for REPL or Pkg seems not a huge priority to me. Getting packages loading in gerneral faster is, i.e. why is it slower when they are not part of the sysimage? Is it only about invalidations? I would want to fix that problem once and for all, avoid/elimitate recompilation, even with less optimized code. I think it’s all about inlining, i.e. it’s too aggressive. Maybe some @noinline in Pkg would help?
How much would it bloat the download if both were shipped by default?
If you ship both, then can you just add a special value for the --sysimage command line flag, perhaps something like --sysimage=slim, which allows usage of the trimmed down version?
Not only, or some of those scripts are about winning (more) benchmarks. It’s impossible for Julia to win some benchmarks with the current startup-delay. We are already winning some benchmark just not showing at the top because of the extra overhead.
I like two system images, but only if automatic. i.e. that should be the default (for non-interactive). Julia knows early on, before the sysimage loaded, or potentially, if starting for interactive use, so can choose dynamically.
If people would have to choose manually then people will not, they would not know needed. I would however like a non-default --sysimage=Julia2… There seems to be movement on that front, the 2.0 milestone, adding potential as a prefix of the name, and juliax proposed command has some changes, so far minimal, and I suggested 2.0 for it.
Have you considered loading Pkg automatically in interactive sessions? That way, the latency is paid upfront, not when hitting ].
Maybe it could even be loaded in the background after showing the prompt? That would make the loading time unnoticeable for all practical purposes.
That is a good idea. Show the prompt and start loading Pkg.jl. It will be ready to go by the time the user actually submits a command.
One trick is that you can type commands while Julia is loading, which I’ve found makes latency much less painful. This works in general, but also in this specific case.
I’m not totally sure how to describe this other than: after pressing ] you can immediately type your command (+ enter) without waiting for the prompt to change.
Here is a recording showing the latency and typing status after waiting for the prompt:
And here is a version where I start typing status immediately after pressing ] without waiting for the prompt to change:
I do this all the time, e.g. when waiting for a package to load after using, I just start typing whatever commands I’d want to run once it’s loaded, and the REPL will catch up and run them. However when screen-sharing etc I’ve found that folks often are surprised that this works, so I thought I’d mention it here.
I realize there is a misconception here. pkgimages always have to be loaded and verified. This it will never be quite the same as if they were in the system image. Working on precompilation will help.
I just remembered how to do this.
launch_pkg(_) = Threads.@spawn @eval using Pkg
Base.atreplinit(launch_pkg)
I’m not sure if Threads.@spawn is better than @async or not for this purpose.  You could add this to your .julia/config/startup.jl now actually with Julia 1.9.
Any thoughts?
Would the same idea be good for other packages in statup.jl?
Revise? PyPlot? IJulia? …
You mean packages have a verification step which is absent for the sysimage, since preapproved? Pkg and REPL are also preapproved, and the verification could be disabled (at least in theory, not currently)?
Should it be some kind of option to turn it off for all packages? It seems not needed each time. Maybe it could even be automated, why is it even there?
One way to automate is I guess using PackageCompiler.jl I believe even when making apps, then it’s implemented by compiling everything into the sysimage, and that’s why it’s fast. I just didn’t realize even with pkgimages we never very close to it.
Pkgimages are for each package, not for a group? Not even for one package and all its dependencies?
I personally think this added layer of complexity will end up confusing a lot of newcomers.
Many users only use Julia in interactive mode (REPL, Pluto, Jupyter, VSCode), myself included. Making their experience a bit slower is a big downside imo. I wonder what is the use case that is driving this decision (is it just writing shell scripts like someone mentioned?)? And is it that popular of a use case?
If anything I would make the default download be the interactive version (like it has always been) and have the extra burden of downloading a different “slim” image on those (presumably more advanced) users.
One question is which image would juliaup download?
Slimming down the system image is an important part of compiling apps, deploying containers for cloud compute, working on low resource hardware, and many other things that have been listed elsewhere on this forum. It’s really important work! But I agree that finding a way to limit the trade-offs for interactive users is an important piece!
Thanks for those examples! Those are indeed important applications and I’m glad there are people working towards it! I still think the extra burden of downloading the non-standard image (if any, maybe they find a way to make it automatic) should be on the “slim” package, but that’s just my own biased opinion 
I am currently not yet worried about the latency of loading Pkg. If we can’t find the necessary speedups during the stabilization period of 1.11 there may some solutions like asynchronously loading Pkg early (not ideal since this holds the loading lock and thus if you do using AnythingElse the Pkg latency will slow you down), or even just giving some visual feedback that something is happening (“perceived latency” vs real latency), a simple “Loading Pkg…” will reduce the perceived latency.
I don’t think that shipping multiple sysimages will be a satisfying solution. The overarching goal of removing standard libraries from the sysimage is to make them upgradeable. This would allow bugfixes and features to be shipped much faster to users, unshakling standard libraries from the core Julia release schedule and making them behave “just like normal Julia packages” but with a copy being shipped with the Julia installation.
I just tried loading Matlab R2022a and using the count “one thousand, two thousand, …” technique, it took 15 seconds. I vote for the KISS approach.
Maybe we just need a better pre-rendered splash screen like MATLAB.
What is making me worry is that as I try to optimize towards the objective of making Pkg.jl load faster by disentangling it from the REPL,
$ time ../julia/julia --startup-file=no -e "@time using Pkg" # Pkg.jl master
  0.958765 seconds (607.97 k allocations: 38.725 MiB, 5.44% compilation time)
real    0m1.251s
user    0m1.092s
sys     0m0.346s
$ time ../julia/julia --startup-file=no --project -e "@time using Pkg" # Pkg.jl#3724
  0.625933 seconds (393.66 k allocations: 25.822 MiB, 4.23% compilation time)
real    0m1.007s
user    0m0.902s
sys     0m0.291s
I seem to de-optimizing the case when you actually do want both Pkg and REPL loaded.
$ time ../julia/julia --startup-file=no -e "@time using Pkg, REPL" # Pkg.jl master
  0.977272 seconds (608.22 k allocations: 38.747 MiB, 5.63% compilation time)
real    0m1.279s
user    0m1.155s
sys     0m0.311s
$ time ../julia/julia --startup-file=no --project -e "@time using Pkg, REPL" # Pkg.jl#3724
  1.293214 seconds (721.07 k allocations: 44.176 MiB, 9.66% gc time, 4.07% compilation time)
real    0m1.605s
user    0m1.408s
sys     0m0.383s
I’m purposely doing this on an older computer, so these timings are likely much longer than you would experience.
What I’m wondering is if we can actually optimize both the load times of individual modules and the aggregate at the same time.