Roadmap for a faster time-to-first-plot?

Packagecompiler is very often mentioned as a solution to these kind of problems , but it really is not.

  1. The issue page is very long https://github.com/JuliaLang/PackageCompiler.jl/issues, full of issues where it is not working.
  2. the author of PackageCompiler.jl made it very clear in his juliacon talk that he is not interested in maintaining the repository (understandably)

Altogether, I don’t think this can be considered a solution to anything. It can somewhat mitigate your problems if you are lucky and willing to spend some effort solving the problems that arise from its application, but the TTFP-problem would still benefit greatly from the planned compiler latency work. Effort going into the latency improvements will also have an immediate impact on all code for all users :heart_eyes:

For now, the best mitigation I’ve found is going through lengths to make sure you do not have to restart Julia, together with the juno cycler to keep an instance ready when Julia does crash

6 Likes

I find myself hoping that a package compiler style solution is found instead of putting all the focus on just reducing compile times in general. My (possibly unfounded) concern is that if we as a community constantly complain about compile times, it might lead the devs to make runtime performance sacrifices in the name of compiler latency.

Personally, I’d rather the opposite and would gladly take significantly longer compile times in exchange for a tiny boost in runtime speed. If the compiler devs can make both faster, then great. However, I know that if the two come into conflict I’d take runtime improvements any day of the week. Given that, I’d much rather an approach that just lets me re-use or even share compile work so that I run into the compiler less often and then it wouldn’t be so painful to crank up compile times.

4 Likes

The analysis above already describes exactly what the tradeoff is. Type-stable code is fine for run time and compile time speed. When you have things start to not be type stable, the compiler seems to have to work a lot harder to find out what is the smallest union you could want, and whether that could be used to get any little optimizations. If the compiler just gives up and goes to Any, in many cases there could be no run time cost, but it would definitely make the compiler faster. But some users might think there’s a regression since some codes that partially inferred would just shoot to Any. IMO that is the right behavior, but I can see the downsides.

In my view, I would like macros that act like pragmas to control how much the compiler should try, defaulting towards faster compile in this case.

5 Likes

Will it be solved when/if this PR is landed?

(Provided there are enough precompile calls, presumably generated via SnoopCompile.jl)

2 Likes

That’s definitely something that would be needed to fix this more directly. Without it, you need to develop around it.

1 Like

The important measure for me is “time to first response for a lambda container”. As Julia is finding more and more use in an web-service computational setting, it’s important that we have a handle on the startup latency for bringing new computational back ends on-line. Precompilation may help somewhat, but, in my case each new query also causes compilation (which, is why it runs fast). I’d rather have it slow start than a slow finish… but a faster start would be excellent.

5 Likes

i often wonder about this, once i run a function once. it gets compiled. theoretically we can save the compiled version reload it. the problem is of course when you hace every functuob then the image becomes so bloated.

if you look at the list from NOV 2018. The top 2 is done, sort of. so the emphasis will be on tttp soon. Then it will things like package compilers.

This is such an interesting aspect of Julia. Julia seems unique amongst languages in that we need to talk about trade-offs in compilation vs run time like that!

Just a comment.

4 Likes

I’d like to emphasize that even though threads here sometimes focus on the negative side of things (i.e. time to first plot is still slow), the fact that the compiler team had such an ambitious priority list and managed to deliver on the first half already is impressive. At least in my experience, compiler bugs were somewhat common in Julia 1.0.0 (I hit some doing simple things, esp. with Missing) and I haven’t encountered any in a while in later releases. Similarly, multithreading made amazing progress in 1.3. Let’s hope that the work on the remaining items is equally successful!

29 Likes

I’m not saying this advice is wrong, but it is a little concerning. If this is just some advice to follow until things are further optimized, that’s fine. But if this is fundamental to compiling, I’d say this is some rather core functionality.

Same with using dictionaries.

I believe he’s referring to dictionaries that store multiple types. This is problematic for inference. This may be a point where the compiler should “give up”.

These are very interesting observations. I started a question here:

and

and

ad never got the answer that type inference itself is the problem. Would it be possible taking @ChrisRackauckas suggestions and putting them into the “performance tipps” section of the manual?

2 Likes

That’s an interesting suggestion. Most (all?) of the Performance Tips are about runtime performance, and hints about compilation performance could be useful. However, it is mostly a moving target.

4 Likes

As is runtime performance (-> small union optimization). I understand perfectly that the focus was on runtime in the past, but code patterns that help the compiler would be very very helpful. My Gtk application needs more than 60 seconds to load and clicking a button the first time requires more than 15 seconds. So getting some hints is extremely appreciated.

8 Likes

A big issue is that this is very difficult to debug. It also glosses over other issues, such as large union types (e.g. StridedArray) which cause significant compile time issues, even if type-stable.

Compile-time optimisation feels a lot like Matlab optimisation, where there is no intuitive “mental model” of what makes code fast, and so one is left with a bag-of-tricks. This is one of the reasons I always avoided Matlab and why I like Julia.

Compile-caching seems like a better place to focus effort. If pulled off, spending any time on compile time improvements will be rendered useless. Of course this won’t help with non-compiled code, but occasional compiling is much better than constant compiling.

9 Likes

I agree:

IMO, compile time itself does not (really) matter. For interactive REPL work, there’s Revise and the Juno cycler, and waiting some seconds here and there does even help you take one step back, collecting thoughts, take a sip of coffee to relax.

In “production setups”, things get a bit more complicated. There’s many scenarios where you just want to be able to execute a script (with arguments) or just execute a single command by julia ... without the need of maintaining a “worker pool” (i.e. Julia sessions having run through a massive startup.jl which one would have to set up manually), managed / called via e.g. tmux

Sad to hear this… really like the basic idea of the package, esp. compile_incremental and derived work such as Fezzik.jl. For me, this actually solves the “two-language problem”. Because when I have to apply at least one other language / tool / framework / … to make Julia’s runtime performance applicable in production scenarios (or to share my executable stuff with non-Julians), I do have (at least) two “languages”.

Could there be a way of “dumping” the entire state of a REPL (automatically or on user’s demand), like a snapshot? I think to remember having read somewhere that this is not esily done in Julia for several reasons though…

3 Likes

In my case actual compile time is a critical component of user responsiveness in a production setting. Our query system uses transparent structures for performance and hence the first time that a particular query is encountered it will cause a compilation. Since queries are constructed dynamically on the client side via a user interface, most queries will be unique. One work around is to have a parameterized query mechanism, but in my nominal customer use case, this isn’t an option. I hope this explains.

1 Like

Wait… You allow users to execute arbitrary functions?.. Building arbitrary expression isn’t a problem if all underlying functions that get called are compiled

Anything that allows anonymous functions will require fresh compilation. Many packages have use cases like this: Plots, ApproxFun, … Sometimes the issue can be mitigated by wrapping the function in another type.

Maybe we can recap the state of things. There are four ways forward to help with this issue:

  1. Static compilation. PackageCompiler.jl and Fezzek.jl are great proofs of concept, but fail for some packages. If these were refined and incorporated into Base that would probably solve a large number users use cases, for example writing command line scripts, but there would be compile time anytime there’s a new type or function involved. This also doesn’t help when actively developing a package.
  2. Faster compiling. The compiler could apparently be made multithreaded, if this sped up compiling by 4x most complaints would be dropped (eg if BandedMatrices.jl took 1.5s to load instead of 5.5s). However there are also compiler “bugs” where things take up to 30s to load, and command line scripts would still be too slow.
  3. Refining current code. By making code type stable compile time is usually 1s which is fine. But this is hard to debug and some code is inherently type unstable. Macros telling blocks to minimally compile / interpret would help.
  4. Interpreter. In principle code could be interpreted the first run with compiling done in a background thread, as in JIT in interpreted languages. This would help package developers and script writers a lot, though I imagine it’s difficult to ensure high performance code is not interpreted.

Personally I’d be very happy if any of 1,2 or 4 were completed, though other users have different needs so will probably have stronger preferences. In an ideal world we’d have all 4 but that’s probably being greedy.

3 Likes

Can you share an example of a multithreaded compiler?