Roadmap for a faster time-to-first-plot?

KestutisMa · August 16, 2019, 3:23am

Are there some updates about solving “time to first” problem?
At what Julia version could we expect load-time/compiler latency improvements?
Are there some obstacles?
Few months ago many were very optimistic about that compile speed could increase “an order of magnitude” as written in some posts.
PackageCompiler is nice, but it works well with few packages, including plotting, which really is fine workaround for time-to-first-plot problem, but some other packages is not compiling so well, also snopping based on runtests.jl doesn’t cover particular use cases, there also exists Fezzik package which uses PackageCompiler API to do more practical snooping based on what user use on REPL, but it seems that doesn’t speedup packages same good (at some cases; probably not all functions gets compiled AOT?) as PackageCompiler (in my case simple application using Makie, Blink, Interact startup time still is about 20s, https://github.com/TsurHerman/Fezzik/issues/4)

Karajan · August 16, 2019, 9:07am

According to the compiler priorities, time-to-first plot is the next thing after multithreading, which is expected to be released in 1.3 (of which we just got an alpha version and 1.2 is very close to release, so … maybe 4-6 months?). Currently much work is done on that front and I can imagine that this might continue in some form even after 1.3. After that, there should be more movement on this issue, but I am in no position so say anything about potential timeframes.

StefanKarpinski · August 16, 2019, 12:30pm

That seems about right. It’s already gotten much better in each release since 1.0 and now it’s the top priority for compiler work.

kevbonham · August 16, 2019, 1:40pm

Not sure if this is orthogonal, but I think TTFP is just a shorthand for start-up times right? A lot of people in bioinformatics are used to building tools / writing scripts that are invoked from the command line, and this is my one remaining pain point in julia. Time to first plot itself doesn’t really bother me anymore because (a) it’s much faster than it used to be and (b) I’ve embraced the workflow of just leaving my julia session open. But developing a command line script is a bit annoying, since every invokation may take 10-15 sec to even get started.

Even with this issue, I still advocate julia to everyone. Once this gets solved, 98% of the objections I encounter will go away. Really looking forward to it - thanks everyone to everyone adding to the effort!

jdad · August 16, 2019, 3:35pm

For my part I totally agree with usefulness of scripts having quick startup. As a stopgap I am currently using the Julia command option « —compile=min » inside the script header, it cuts off somewhat the « using » time. But I worry about possible side effects, so if somebody could comment about this being a good or bad idea ?

rdeits · August 16, 2019, 3:57pm

The side effect is that it makes everything slow, and by a potentially very large factor. Compare the performance of sin with normal Julia:

julia> using BenchmarkTools
                                                                                                                                                                                                                                                                                                                                      
julia> @btime sin(x[]) setup = x = Ref(1.0)
  6.217 ns (0 allocations: 0 bytes)                                                                                                                                                                                                                                                                                                   
0.8414709848078965

and again with julia --compile=min:

julia> using BenchmarkTools

julia> @btime sin(x[]) setup = x = Ref(1.0)
  18.946 μs (54 allocations: 1.05 KiB)
0.8414709848078965

With --compile=min, computing sin(x) is ~3000 times slower.

jdad · August 16, 2019, 4:10pm

Yes I understand it - but for scripts using very short time and small datasets (in the 100-1000) the « usings time take up half of the time (about 25secs), by using « compile=min » it goes to half of it, so much better. I understand that with larger computation it would be much worse. My concern was about consequences about precision and/or precompiling issues - you get the Recompiling message with the REPL, but not with a batch script … And what it it should be recompiled and is not, or in a « min » mode?

rdeits · August 16, 2019, 5:03pm

Ah, I see. As far as I know, the answers you get should be precisely the same, so if using compile=min helps in your case, then I don’t see any problem with it.

kevbonham · August 16, 2019, 5:04pm

That’s useful to know, esp for development (where I’m often re-running stuff on test data over and over).

OT, it looks like you’re trying to quote code using «». Instead, try using back-ticks (`):

`like this`

Gives you like this.

jdad · August 16, 2019, 5:10pm

Thank you for the reassurance. So, except for possible precompilation woos (I happen to also use same modules with REPL, so without min) it seems ok for dev/quick scripts.

jdad · August 16, 2019, 5:10pm

Thank you for the quoting tip then.

purplishrock · August 16, 2019, 6:58pm

yup. i develop things as command line tools taking command line arguments. Startup time is about 2x my run time. A bit painful when debugging.

However this seems to indicate my debugging method is probably flawed. My reading of this thread makes it sound like maybe i should

open a julia shell, using Revise.
Put together a “command line” as an intermediate function taking direct arguments, and run that …
hoping Revise will keep recompiles to a minimum.

Does that sound about right ?

kevbonham · August 16, 2019, 7:22pm

I’ve done something similar in the past - I mostly use ArgParse.jl for getting the arguments, and it essentially returns a dictionary, so I just hard-code the test dictionary and do essentially what you describe.

affans · August 16, 2019, 9:14pm

Can someone explain why the plotting library needs to be written in Julia? Why not consider building a plotting library in C (optimized for use in Julia) or using an already available library, (I believe what GR.jl uses). If features are missing the Julia community can try to fill in the gaps?

davidanthoff · August 16, 2019, 9:33pm

VegaLite.jl is an example of a plotting package that wraps a plotting library written in a different language (JavaScript). It has pretty good startup times and a huge feature space (because there is a whole team working on the underlying plotting library). I think there are similar packages for other third party plotting packages.

affans · August 16, 2019, 9:35pm

That’s right. I usually use Vegalite.jl however I find the problem with Vegalite is the initial data processing to get it into a "long’ format as well as the plot setup. Sometimes this formatting takes longer than the 30 seconds using Plots.

dlfivefifty · August 16, 2019, 9:37pm

I think building a new plotting library is about 1000x more complicated than just getting compilation caching working…

Existing libraries like GR are ok but still not Matlab quality. (PyPlot is pretty good but horrendously slow.) I think it’s just not realistic that there will ever be an open source C based plotting library, it’s too tedious to develop so only makes sense if a company is funding it.

Makie seems much more promising. It’s very rough around the edges but on the high performance side it feels light years ahead of more well established open source plotting frameworks. If people have time and energy to throw at fast plotting it seems to me best to use it for (1) compilation caching and (2) improving Makie.

affans · August 16, 2019, 9:42pm

Also how come packages like ggplot in R is so fast and powerful? is it because R is an interpreted language (which means the library for ggplot is already compiled)? Sorry for my dumb questions.

Daniel_Berge · August 16, 2019, 9:47pm

It’s more that R is interpreted so it doesn’t have to compile anything ahead of time, and ggplot is written in C, so it has good performance.

davidanthoff · August 16, 2019, 9:52pm

Would it help if you could directly pass arrays in, say like @vlplot(:point, x=[1,2,3]), instead of having to format it as a table first? I’ve been wanting to add that feature for a while.

Another idea is to create another package like SimpleVega.jl that essentially implements the old API from Vega.jl (Vega.jl: A Julia package for generating visualizations using Vega), or something like it, but uses the infrastructure from VegaLite.jl.

Might make sense to split this discussion into a separate thread, given that this is really more about VegaLite.jl then compile time.

Topic		Replies	Views
Make first call faster Performance ttfp	6	2556	July 12, 2019
Finding and fixing invalidations: now, everyone can help reduce time-to-first-plot Community package , ttfp	9	2535	August 31, 2020
Compiler work priorities Internals & Design	123	23113	August 6, 2021
Time to first plot seems to be much alleviated in 1.4. What happend? General Usage question , announcement , plotting , ttfp	20	3476	June 15, 2020
Problem of first plot? can be better? General Usage	10	951	February 27, 2020

Roadmap for a faster time-to-first-plot?

Related topics