Roadmap for a faster time-to-first-plot?

i often wonder about this, once i run a function once. it gets compiled. theoretically we can save the compiled version reload it. the problem is of course when you hace every functuob then the image becomes so bloated.

if you look at the list from NOV 2018. The top 2 is done, sort of. so the emphasis will be on tttp soon. Then it will things like package compilers.

This is such an interesting aspect of Julia. Julia seems unique amongst languages in that we need to talk about trade-offs in compilation vs run time like that!

Just a comment.

4 Likes

I’d like to emphasize that even though threads here sometimes focus on the negative side of things (i.e. time to first plot is still slow), the fact that the compiler team had such an ambitious priority list and managed to deliver on the first half already is impressive. At least in my experience, compiler bugs were somewhat common in Julia 1.0.0 (I hit some doing simple things, esp. with Missing) and I haven’t encountered any in a while in later releases. Similarly, multithreading made amazing progress in 1.3. Let’s hope that the work on the remaining items is equally successful!

29 Likes

I’m not saying this advice is wrong, but it is a little concerning. If this is just some advice to follow until things are further optimized, that’s fine. But if this is fundamental to compiling, I’d say this is some rather core functionality.

Same with using dictionaries.

I believe he’s referring to dictionaries that store multiple types. This is problematic for inference. This may be a point where the compiler should “give up”.

These are very interesting observations. I started a question here:

and

and

ad never got the answer that type inference itself is the problem. Would it be possible taking @ChrisRackauckas suggestions and putting them into the “performance tipps” section of the manual?

2 Likes

That’s an interesting suggestion. Most (all?) of the Performance Tips are about runtime performance, and hints about compilation performance could be useful. However, it is mostly a moving target.

4 Likes

As is runtime performance (-> small union optimization). I understand perfectly that the focus was on runtime in the past, but code patterns that help the compiler would be very very helpful. My Gtk application needs more than 60 seconds to load and clicking a button the first time requires more than 15 seconds. So getting some hints is extremely appreciated.

8 Likes

A big issue is that this is very difficult to debug. It also glosses over other issues, such as large union types (e.g. StridedArray) which cause significant compile time issues, even if type-stable.

Compile-time optimisation feels a lot like Matlab optimisation, where there is no intuitive “mental model” of what makes code fast, and so one is left with a bag-of-tricks. This is one of the reasons I always avoided Matlab and why I like Julia.

Compile-caching seems like a better place to focus effort. If pulled off, spending any time on compile time improvements will be rendered useless. Of course this won’t help with non-compiled code, but occasional compiling is much better than constant compiling.

9 Likes

I agree:

IMO, compile time itself does not (really) matter. For interactive REPL work, there’s Revise and the Juno cycler, and waiting some seconds here and there does even help you take one step back, collecting thoughts, take a sip of coffee to relax.

In “production setups”, things get a bit more complicated. There’s many scenarios where you just want to be able to execute a script (with arguments) or just execute a single command by julia ... without the need of maintaining a “worker pool” (i.e. Julia sessions having run through a massive startup.jl which one would have to set up manually), managed / called via e.g. tmux

Sad to hear this… really like the basic idea of the package, esp. compile_incremental and derived work such as Fezzik.jl. For me, this actually solves the “two-language problem”. Because when I have to apply at least one other language / tool / framework / … to make Julia’s runtime performance applicable in production scenarios (or to share my executable stuff with non-Julians), I do have (at least) two “languages”.

Could there be a way of “dumping” the entire state of a REPL (automatically or on user’s demand), like a snapshot? I think to remember having read somewhere that this is not esily done in Julia for several reasons though…

3 Likes

In my case actual compile time is a critical component of user responsiveness in a production setting. Our query system uses transparent structures for performance and hence the first time that a particular query is encountered it will cause a compilation. Since queries are constructed dynamically on the client side via a user interface, most queries will be unique. One work around is to have a parameterized query mechanism, but in my nominal customer use case, this isn’t an option. I hope this explains.

1 Like

Wait… You allow users to execute arbitrary functions?.. Building arbitrary expression isn’t a problem if all underlying functions that get called are compiled

Anything that allows anonymous functions will require fresh compilation. Many packages have use cases like this: Plots, ApproxFun, … Sometimes the issue can be mitigated by wrapping the function in another type.

Maybe we can recap the state of things. There are four ways forward to help with this issue:

  1. Static compilation. PackageCompiler.jl and Fezzek.jl are great proofs of concept, but fail for some packages. If these were refined and incorporated into Base that would probably solve a large number users use cases, for example writing command line scripts, but there would be compile time anytime there’s a new type or function involved. This also doesn’t help when actively developing a package.
  2. Faster compiling. The compiler could apparently be made multithreaded, if this sped up compiling by 4x most complaints would be dropped (eg if BandedMatrices.jl took 1.5s to load instead of 5.5s). However there are also compiler “bugs” where things take up to 30s to load, and command line scripts would still be too slow.
  3. Refining current code. By making code type stable compile time is usually 1s which is fine. But this is hard to debug and some code is inherently type unstable. Macros telling blocks to minimally compile / interpret would help.
  4. Interpreter. In principle code could be interpreted the first run with compiling done in a background thread, as in JIT in interpreted languages. This would help package developers and script writers a lot, though I imagine it’s difficult to ensure high performance code is not interpreted.

Personally I’d be very happy if any of 1,2 or 4 were completed, though other users have different needs so will probably have stronger preferences. In an ideal world we’d have all 4 but that’s probably being greedy.

3 Likes

Can you share an example of a multithreaded compiler?

I know nothing about compilers, I thought I remember that being suggested but perhaps I misunderstood.

The ORCJIT backend of LLVM is able to be multithreaded, and this is somewhat of a recent change.

ORCJIT allows speculative compilation, so it can be setup to try and compile some calls before they happen (on a separate thread).

Even in these cases, a lot can be cached. The majority of Plots can be cached. As long as you don’t inline all of the calls into the one that allows an anonymous function, the majority of stuff shouldn’t have to recompile. This is one thing that Plots.jl does get right (only the first time to plot is slow, then if you change to using a different plot call the recompiles are fine because the majority of the functions are just weird dictionary handling and stuff like that)

3 Likes

Just to be clear we are using “compile time” to mean the whole operation from hitting enter, which I believe consists of the following stages:

  1. Parsing.
  2. Reducing to LLVM byte code. This includes type inference.
  3. LLVM compiling byte code to machine code.
  4. Running the code.

IIRC the actual LLVM compile time (3) is a small fraction of time, with most “compile” time being type inference (2). Speaking with zero experience/knowledge on the matter, it does seem like there is opportunity for multithreading here.

3 Likes

Could you elaborate on this? Just can’t imagine such a scenario, where the backend has no idea even about the structure of the potential queries so it would have to JIT-compile its reaction.

I’d go for 1 (static comp.) + 4 (interpreter). When developing (prototyping), use the interpreter (JIT-compilation “in the background” would be awesome, but not mandatory). When runtime speed is crucial, enable the JIT for the final touches of development and optimisations. Then statically compile everything that’s known in advance, and leave dynamic one-off stuff at runtime to the interpreter (optionally with JIT-compilation in the background).

I think compilation can never reach the interactivity and the responsiveness for one-off calls of an interpreter.

1 Like

For Plots it’s mostly Julia type inference time, but in most cases it’s the other way around. But yes, I assume type inference should be able to do similar things? But that does mean that incorporating the ORCJIT changes won’t fix the Plots.jl issue.