Roadmap for a faster time-to-first-plot?

baggepinnen · August 25, 2019, 12:03am

Packagecompiler is very often mentioned as a solution to these kind of problems , but it really is not.

The issue page is very long https://github.com/JuliaLang/PackageCompiler.jl/issues, full of issues where it is not working.
the author of PackageCompiler.jl made it very clear in his juliacon talk that he is not interested in maintaining the repository (understandably)

Altogether, I don’t think this can be considered a solution to anything. It can somewhat mitigate your problems if you are lucky and willing to spend some effort solving the problems that arise from its application, but the TTFP-problem would still benefit greatly from the planned compiler latency work. Effort going into the latency improvements will also have an immediate impact on all code for all users

For now, the best mitigation I’ve found is going through lengths to make sure you do not have to restart Julia, together with the juno cycler to keep an instance ready when Julia does crash

Mason · August 25, 2019, 12:45am

I find myself hoping that a package compiler style solution is found instead of putting all the focus on just reducing compile times in general. My (possibly unfounded) concern is that if we as a community constantly complain about compile times, it might lead the devs to make runtime performance sacrifices in the name of compiler latency.

Personally, I’d rather the opposite and would gladly take significantly longer compile times in exchange for a tiny boost in runtime speed. If the compiler devs can make both faster, then great. However, I know that if the two come into conflict I’d take runtime improvements any day of the week. Given that, I’d much rather an approach that just lets me re-use or even share compile work so that I run into the compiler less often and then it wouldn’t be so painful to crank up compile times.

ChrisRackauckas · August 25, 2019, 1:00am

The analysis above already describes exactly what the tradeoff is. Type-stable code is fine for run time and compile time speed. When you have things start to not be type stable, the compiler seems to have to work a lot harder to find out what is the smallest union you could want, and whether that could be used to get any little optimizations. If the compiler just gives up and goes to Any, in many cases there could be no run time cost, but it would definitely make the compiler faster. But some users might think there’s a regression since some codes that partially inferred would just shoot to Any. IMO that is the right behavior, but I can see the downsides.

In my view, I would like macros that act like pragmas to control how much the compiler should try, defaulting towards faster compile in this case.

tkf · August 25, 2019, 2:38am

Will it be solved when/if this PR is landed?

github.com/JuliaLang/julia

RFC: allow precompile to associate a MethodInstance with a module

JuliaLang:master ← JuliaLang:teh/force_precompiles

opened 04:48PM - 24 Mar 19 UTC

timholy

+49 -6

Suppose we have a package that does this: ```julia module MyPkg struct T en…d if ccall(:jl_generating_output, Cint, ()) == 1 precompile(setindex!, (Dict{T,Int}, Int, T)) end end ``` Unfortunately, the `precompile` does nothing useful. I dove into this in some detail. It appears that the reason is that no `setindex!` methods are defined in `MyPkg`, and consequently the corresponding binding `b` has `b->value == NULL`. As a consequence it doesn't get added to the `*.ji` cache file. This is unfortunate, because inferring `setindex!` for Dicts is quite expensive. This PR aims to fix that by allowing package authors to write ```julia precompile(MyPkg, setindex!, (Dict{T,Int}, Int, T)) ``` Note the module argument in the first slot. This adds it to a list of MethodInstances that should be associated with `MyPkg`. I've verified it doesn't break anything, but it doesn't yet work. I tried the instructions [here](https://docs.julialang.org/en/latest/devdocs/debuggingtips/#Debugging-precompilation-errors-1) but gdb complains ```sh (gdb) attach -w -n julia-debug Illegal process-id: -w -n julia-debug. ``` Perhaps someone who knows more about this than me can offer some helpful tips. I am not sure this alone will suffice, but I suspect that perhaps in conjunction with just a few other tweaks (and good ways to measure time spent on inference, see #31444 and [this SnoopCompile branch](https://github.com/timholy/SnoopCompile.jl/tree/teh/inference_timing)) we could dramatically cut latencies for certain packages. CC @SimonDanisch, who I know is very interested in the topic.

(Provided there are enough precompile calls, presumably generated via SnoopCompile.jl)

ChrisRackauckas · August 25, 2019, 2:50am

That’s definitely something that would be needed to fix this more directly. Without it, you need to develop around it.

cce · August 25, 2019, 9:04am

The important measure for me is “time to first response for a lambda container”. As Julia is finding more and more use in an web-service computational setting, it’s important that we have a handle on the startup latency for bringing new computational back ends on-line. Precompilation may help somewhat, but, in my case each new query also causes compilation (which, is why it runs fast). I’d rather have it slow start than a slow finish… but a faster start would be excellent.

xiaodai · August 25, 2019, 9:09am

i often wonder about this, once i run a function once. it gets compiled. theoretically we can save the compiled version reload it. the problem is of course when you hace every functuob then the image becomes so bloated.

if you look at the list from NOV 2018. The top 2 is done, sort of. so the emphasis will be on tttp soon. Then it will things like package compilers.

xiaodai · August 25, 2019, 9:13am

This is such an interesting aspect of Julia. Julia seems unique amongst languages in that we need to talk about trade-offs in compilation vs run time like that!

Just a comment.

piever · August 25, 2019, 12:32pm

I’d like to emphasize that even though threads here sometimes focus on the negative side of things (i.e. time to first plot is still slow), the fact that the compiler team had such an ambitious priority list and managed to deliver on the first half already is impressive. At least in my experience, compiler bugs were somewhat common in Julia 1.0.0 (I hit some doing simple things, esp. with Missing) and I haven’t encountered any in a while in later releases. Similarly, multithreading made amazing progress in 1.3. Let’s hope that the work on the remaining items is equally successful!

pistacliffcho · August 25, 2019, 4:18pm

I’m not saying this advice is wrong, but it is a little concerning. If this is just some advice to follow until things are further optimized, that’s fine. But if this is fundamental to compiling, I’d say this is some rather core functionality.

Same with using dictionaries.

Zach_Christensen · August 25, 2019, 5:03pm

I believe he’s referring to dictionaries that store multiple types. This is problematic for inference. This may be a point where the compiler should “give up”.

tobias.knopp · August 26, 2019, 6:25am

ChrisRackauckas:

The solution is shown to more concretely be:

Get rid of the KW nonsense

Concretize the argument handling earlier has to happen, otherwise you already take a 7 second hit

Concretization of the types should happen by the display function, since 13 seconds of it is type inference into the (hard coded and definitely not runtime loaded) gr_display function. A large part of this is because the superlinear behavior of the compiler.

Get rid of Measures.jl. It has too much type information and causes issues. If anything, replace it with values and associated symbols, and hardcode branches instead of over dispatching. Units are just not a good idea for compile time… this would likely be helpful for Gadfly as well.

Throw a few declarations on the types in the structs that cannot be well inferred, but you know what they are by the time you dig a few functions down. This should only be a few pieces, not every argument that comes out of a KW.

These are very interesting observations. I started a question here:

and

ad never got the answer that type inference itself is the problem. Would it be possible taking @ChrisRackauckas suggestions and putting them into the “performance tipps” section of the manual?

Tamas_Papp · August 26, 2019, 6:49am

That’s an interesting suggestion. Most (all?) of the Performance Tips are about runtime performance, and hints about compilation performance could be useful. However, it is mostly a moving target.

tobias.knopp · August 26, 2019, 7:40am

As is runtime performance (-> small union optimization). I understand perfectly that the focus was on runtime in the past, but code patterns that help the compiler would be very very helpful. My Gtk application needs more than 60 seconds to load and clicking a button the first time requires more than 15 seconds. So getting some hints is extremely appreciated.

dlfivefifty · August 29, 2019, 11:10pm

A big issue is that this is very difficult to debug. It also glosses over other issues, such as large union types (e.g. StridedArray) which cause significant compile time issues, even if type-stable.

Compile-time optimisation feels a lot like Matlab optimisation, where there is no intuitive “mental model” of what makes code fast, and so one is left with a bag-of-tricks. This is one of the reasons I always avoided Matlab and why I like Julia.

Compile-caching seems like a better place to focus effort. If pulled off, spending any time on compile time improvements will be rendered useless. Of course this won’t help with non-compiled code, but occasional compiling is much better than constant compiling.

asprionj · August 31, 2019, 11:04pm

I agree:

IMO, compile time itself does not (really) matter. For interactive REPL work, there’s Revise and the Juno cycler, and waiting some seconds here and there does even help you take one step back, collecting thoughts, take a sip of coffee to relax.

In “production setups”, things get a bit more complicated. There’s many scenarios where you just want to be able to execute a script (with arguments) or just execute a single command by julia ... without the need of maintaining a “worker pool” (i.e. Julia sessions having run through a massive startup.jl which one would have to set up manually), managed / called via e.g. tmux…

Sad to hear this… really like the basic idea of the package, esp. compile_incremental and derived work such as Fezzik.jl. For me, this actually solves the “two-language problem”. Because when I have to apply at least one other language / tool / framework / … to make Julia’s runtime performance applicable in production scenarios (or to share my executable stuff with non-Julians), I do have (at least) two “languages”.

Could there be a way of “dumping” the entire state of a REPL (automatically or on user’s demand), like a snapshot? I think to remember having read somewhere that this is not esily done in Julia for several reasons though…

cce · August 31, 2019, 11:40pm

In my case actual compile time is a critical component of user responsiveness in a production setting. Our query system uses transparent structures for performance and hence the first time that a particular query is encountered it will cause a compilation. Since queries are constructed dynamically on the client side via a user interface, most queries will be unique. One work around is to have a parameterized query mechanism, but in my nominal customer use case, this isn’t an option. I hope this explains.

jling · August 31, 2019, 11:57pm

Wait… You allow users to execute arbitrary functions?.. Building arbitrary expression isn’t a problem if all underlying functions that get called are compiled

dlfivefifty · September 1, 2019, 7:29am

Anything that allows anonymous functions will require fresh compilation. Many packages have use cases like this: Plots, ApproxFun, … Sometimes the issue can be mitigated by wrapping the function in another type.

Maybe we can recap the state of things. There are four ways forward to help with this issue:

Static compilation. PackageCompiler.jl and Fezzek.jl are great proofs of concept, but fail for some packages. If these were refined and incorporated into Base that would probably solve a large number users use cases, for example writing command line scripts, but there would be compile time anytime there’s a new type or function involved. This also doesn’t help when actively developing a package.
Faster compiling. The compiler could apparently be made multithreaded, if this sped up compiling by 4x most complaints would be dropped (eg if BandedMatrices.jl took 1.5s to load instead of 5.5s). However there are also compiler “bugs” where things take up to 30s to load, and command line scripts would still be too slow.
Refining current code. By making code type stable compile time is usually 1s which is fine. But this is hard to debug and some code is inherently type unstable. Macros telling blocks to minimally compile / interpret would help.
Interpreter. In principle code could be interpreted the first run with compiling done in a background thread, as in JIT in interpreted languages. This would help package developers and script writers a lot, though I imagine it’s difficult to ensure high performance code is not interpreted.

Personally I’d be very happy if any of 1,2 or 4 were completed, though other users have different needs so will probably have stronger preferences. In an ideal world we’d have all 4 but that’s probably being greedy.

lobingera · September 1, 2019, 8:03am

Can you share an example of a multithreaded compiler?

Topic		Replies	Views
Make first call faster Performance ttfp	6	2553	July 12, 2019
Finding and fixing invalidations: now, everyone can help reduce time-to-first-plot Community package , ttfp	9	2532	August 31, 2020
Compiler work priorities Internals & Design	123	23076	August 6, 2021
Time to first plot seems to be much alleviated in 1.4. What happend? General Usage question , announcement , plotting , ttfp	20	3474	June 15, 2020
Problem of first plot? can be better? General Usage	10	949	February 27, 2020

Roadmap for a faster time-to-first-plot?

Related topics