Roadmap for a faster time-to-first-plot?

Having unintentionally “thrown shade” and performed a “kvetchfest” in another thread, let me first apologise. I’m very happy with the improvements made in Julia, especially the interpreter and debugger, and am very much excited for future developments.

That said, perhaps it would help ease frustration to have a “roadmap” for when the time-to-first-plot issues are planned to be addressed. It actually seems like it’s already a solved problem with PackageCompiler.jl, if that were cleaned up in an easy-to-use framework.

7 Likes

Not sure there is much more to say.

1 Like

Yes I know the post, but that was 6 months ago, so a lot has changed since then.

Yes, some issues at the top of the list have been addressed. Since “the time-to-first-plot problem” was not at the top of the list, I assume it comes later.

That said, as eg documented in that thread, compile times improved markedly between 1.1 and what will be 1.2.

Also, note that a lot of action happened in

https://github.com/JuliaDebug

with the new debugger. Given the small number of people who have the skillset to contribute to these things, it is nothing short of amazing.

14 Likes

No one person can perform a “kvetchfest”—it’s a collective activity when everyone decides its a good time to complain about the same thing, which in this case is compiler latency—something that is already well known to be a problem. Unfortunately I’m not really the one to say more about it but I have asked @jeff.bezanson to say more. The nature of compiler work is that you don’t see much surface change but that does not mean that a lot of hard work is not and has not been done. Many packages have seen an order of magnitude reduction in startup time, but not all of them—these things are a trade off and some have gotten slower by a bit instead.

12 Likes

Thanks for the update, I look forward to future progress. Just to clarify, I’m less worried about “compile time” than the following two things:

  1. using time (the 2nd time, precompile time is not a concern). This effectively limits the “stack depth” of package dependencies, as at some point I’d rather wait a year for Julia v1.7 than 20 seconds every time. PackageCompiler.jl already solves this, though I haven’t figured out how to incorporate it into my workflow when packages change. Having a timeline would let me know whether to invest energy into learning PackageCompiler.jl or to just wait it out.
  2. Type-inference bugs that trigger minute-long compiles or sometimes crashes (see for example this recent issue). This has the effect of pushing tests over the limit on Travis, or hours of trying random changes to work around the bug. Unlike item 1, there is no clear workaround other than to wait, but of course Jeff has limited time.
2 Likes

The outline from Stefan’s post linked above is still broadly accurate. Our priority at the moment is multithreading, and as soon as that mostly works we will return to focusing on latency. Looking at the timeline of all this, I can’t help but agree that progress has been slower than I expected.

Latency is a difficult, multi-faceted issue. You are quite right to specify which particular kinds of latency matter to you, because there are a few separate, mostly-unrelated sources of it: (1) package loading (consisting mostly of method table merging, and a bit of re-compilation), (2) the general speed of type inference, (3) type inference bugs or quasi-bugs that cause it to run an exceptionally long time, (4) front end (parsing and lowering; not the biggest issue right now), and (5) LLVM optimizations. Again, these are all mostly unrelated and different packages or workflows can hit different ones.

While there have been a few modest commits to master that chip away at this, there is an iceberg underneath of things we have tried, experiments run, and of course more things we are planning to try. Some things we try don’t work, or have no effect, or have a much smaller effect than hoped. Some things give nice improvements, at the expense of e.g. worse type information. Because of this it is very hard to promise “X% improvement by Y date”. (Side note: in case anybody still doesn’t believe that return_type is a bad idea, using it means we can’t speed up the compiler without breaking your code. :microphone: :arrow_down:)

In the hopefully near future we will be trying things like multi-threading code generation, tiered compilation (running code in an interpreter first and gradually transitioning to the compiler), various changes to type inference, etc.

In the meantime there are a couple tricks to try to work around latency issues:

  1. Try running with -O0 or -O1
  2. Try running with --compile=min
  3. Try applying this patch:
diff --git a/base/compiler/params.jl b/base/compiler/params.jl
index 8f87feb734..499b44a9f6 100644
--- a/base/compiler/params.jl
+++ b/base/compiler/params.jl
@@ -59,7 +59,7 @@ struct Params
                    #=inlining, ipo_constant_propagation, aggressive_constant_propagation, inline_cost_threshold, inline_nonleaf_penalty,=#
                    inlining, true, false, 100, 1000,
                    #=inline_tupleret_bonus, max_methods, union_splitting, apply_union_enum=#
-                   400, 4, 4, 8,
+                   400, 1, 4, 8,
                    #=tupletype_depth, tuple_splat=#
                    3, 32)
     end

which can significantly cut inference time.

36 Likes

Perhaps an alternative take on this is “compiling absolutely everything instantly is really, really hard, so it should be as easy as possible to ‘keep’ compiled code”.

Perhaps it would make sense to prioritize PackageCompiler more? Things have certainly improved but the process of swapping system images is pretty fiddly right now and, at least in my experience, thiings often go wrong when trying to use PackageCompiler. It seems to me that if there were a really slick stdlib PackageCompiler, where people could simply call save_image and load_image (or something similarly simple), the bulk of everyone’s plotting stuff would already be compiled anyway: you’d just put a load_image into your startup.jl and forget about it.

I don’t really know what I’m talking about, but making compilation blazing fast sounds like an incredibly difficult problem to me, period. Stashing compiled code on the other hand, seems like it should be far easier.

18 Likes

In a bliss world, a superprecompile command in the package manager would run PackageCompiler on my current environment, and then whenever I switch environment and there is a custom sysimage for that environment it would be used automatically. Bonus points if I could configure the package manager so that it runs superprecompile automatically whenever I make a change to my current environment (like up, or add or something like that). The key would really be that the package manager manages these sysimages for me.

I’m pretty positive that would go a long way to solve the main pain points I have right now with latency.

16 Likes

I’m all for making it easier to build and use system images, but many people incrementally develop and change package code, necessitating recompilation at some point. Better ways to cache native code is something we will look at though.

An auto environment-to-sysimg converter is a great idea. Of course, the package manager is itself written in julia, so we’d either need to pass system images to julia manually, or else load a minimal system image, call the package manager to find the right real image, and then restart.

9 Likes

I think a combination of Revise and a compiled system image could be very powerful… You compile the diffs with Revise on startup, until that takes long enough to justify a system image rebuild :wink:

11 Likes

I think good PackageCompiler experience would also need community effort. Some complex packages need to test themselves with PackageCompiler. I recently made PyCall PackageCompiler-compatible (Make PyCall.jl AOT-compilable by tkf · Pull Request #651 · JuliaPy/PyCall.jl · GitHub) but it was not super easy since PyCall has highly non-trivial __init__ etc. Packages with mutable global states is harder to make it work with PackageCompiler. Packages using ccall heavy also needs testing with PackageCompiler (the bonus point is that you may be able to catch ccall misuse `Invalid bitcast` and `LLVM ERROR: Broken function found, compilation aborted!` when including PyCall in system image · Issue #31473 · JuliaLang/julia · GitHub). Also, I think such complex packages need to have CI to detect regression of their own code, PackageCompiler, or Julia.

12 Likes

From my point of view solving the latency problem for non-package developers first and worrying about package developers later would be an entirely reasonable strategy.

I consistently run into the biggest problems when I bring new users to julia with this latency issue. They come from R or Python, and they just don’t see why they should wait so long for their first plot. 95% of those users will never, ever develop a package. The ones that will eventually, will at that point have seen all the other benefits of julia and I think will be ok to make a compromise on the compile time. But I really strongly believe that the biggest problem right now is for “normal” users that just want to use julia to get some science done.

Here is another way to think about this: for package devs, how would the experience look on other platforms? As far as I can tell, even with the compile latencies, julia is by far one of the smoothest environments for developing packages. Heck, we are competing with other platforms where you have to run a C compiler if you want to produce a fast package. So from my point of view, some latency issues for package devs are really not the end of the world.

But for end users they are, because they can just do their plot and data analysis in R or Python, and it will most likely be faster for them.

And I think a clever combination of sysimages that are integrated into the package manager and operate on a per environment level could essentially solve this 90% for the “casual” user.

52 Likes

Sure, that is all completely fine and true. I’m in no way arguing that users shouldn’t have bigger precompiled system images to reduce latency for their needs. I’m just responding to this post: Roadmap for a faster time-to-first-plot? - #7 by dlfivefifty which specifically called out different kinds of latency.

AFAICT, all you need to do is put using Plots (etc.) in userimg.jl, and perhaps some tooling to make that even easier. That is of course a totally different kind of thing from making the compiler faster, and can be worked on in parallel.

3 Likes

When I was reading this post, this was not the conclusion I was expecting. Everything before this seemed to me to support tiered compilation. The thing is that even with fully pre-compiled packages and environments, it’s really hard (impossible, actually) to predict what a user is going to do, which means that you cannot realistically have it all compiled in advance. You could reasonably do that for an application that you can trace beforehand, but not for arbitrary user input. And especially a first-time user isn’t going to be in a position to run code and create a trace or even make an environment before wanting to plot something or load a data frame. As far as I can tell, the only strategy that will provide the experience you describe of interpreter-like latency for first-time users is using an interpreter and doing compilation in the background.

8 Likes

Your comment is basically saying “Julia does not and should not solve the two language problem”, which seems to go against the mantra of Julia.

I assume JuliaComputing is prioritising whatever their paying customers are asking for, as they should be. But I really hope they still intend to work on truly solving the two language problem, by making Julia a viable replacement for Python/R/Matlab for interactive use as well as HPC.

13 Likes

This, I don’t get.

What if the professor and other members in the group use Julia? Should naive users develop their own codebase in R, even if no-one else is using R, and they themselves don’t know it?

I’m not clamoring for the core devs to devote their time to this, but it seems strange to basically imply that Julia is only (and should only be) for ‘sophisticated’ users.

14 Likes

It’s probably easy, but I think the issue is, for many (and also for me): Where should i type make? I know, on the keyboard. But, where do I put the letters on the computer screen?

This is not a real question on my own behalf. I can figure this out by reading some docs. (But at this moment, I would have no idea how to accomplish this.) I wanted to get across that for many users (even people who are quite familiar with programming in general, and Julia in particular), the statement “put something in userimg.jl and type make” hides a large amount of required knowledge and understanding.

20 Likes

I’m just going to expand slightly on my previous post.

It’s possible to be reasonably sophisticated in some aspects of programming, and yet be a ‘total rube’ in others. For myself, I’m comfortable with parametric types, type stability, generic programming, etc. etc. But I’m very uncomfortable with concepts such as package management, integrated testing, git wrangling, and userimg.jl

So even though someone might in some cases be literally looking at the screen and going “you have to show me which button to push”, they could still be capable of exploiting many of the strengths of Julia, and in fact be chafing against limitations in other languages.

Perhaps I’m going off-topic here (sorry about that). The relevance to this thread might be to highlight that even some “sophisticated” users can still use some hand-holding or need certain tools to be easier to use.

16 Likes

If the PackageCompiler path is a solution then ideally it should be made dead simple to use, having to know what userimg is or how to build Julia from source won’t do it for most people (for example I don’t know how any of these things, and I’m a relatively experienced user).

But from some of the comments above it looks like it’s not even clear if the PackageCompiler is a good solution, has someone made some systematic benchmarks ? For example compile Plots, DataFrames, Distributions, etc. and compare the timings for common use cases.

2 Likes