This is so exciting. This will have a major impact on Julia if it becomes reality. I need to excuse all the time when using Julia in production that it is so slow.
I’m curious, why Julia doesn’t compile functions only after they are executed several times in interpreter mode? Is there a not obvious downside, or this is somehow incompatible with the language itself, or just too difficult to implement?
Well, one reason might be that the interpreter is only available for a few months…
The other issue is probably that the function being interpreted (possibly while compiling in the background) could be the massive_neural_network_train() or alike that you’ll call only once and it could take days even if it were compiled (and it would have compiled in less than a second), and once started executing you can’t really switch even after compiled (though if most of the processing is done in other functions that are repeatedly called it wouldn’t be so bad, though that will become another performance gotcha in which a small difference will make the program thousands of times slower).
You could have some flag like @compile which could force the compiler to never interpret such functions, and/or use one of the heuristic mentioned like check if the function has loops, but it would still require a somewhat fast and reliable interpreter.
There’s progress with more precompilation, which was recently merged into Plots.jl master.
I should also point out that if all you want is some basic plot types supported directly by GR.jl, you can already get a first plot within 5 seconds (and lesser with precompile statements).
the new release is qualitatively faster + nicer to use
Indeed Plots 0.27.1 is the first release with the latency improvements (and more to come!)
So pumped to try this tomorrow!
UnicodePlots is also very fast if all your need is bar plot or simple line plots
#edit previously I claim that PackageCompilerX is abandoned, but I am totally wrong, it is under active development. Since I am not a compiler specialist, please forget what I have said and forgive my ignorance.
C++ is AOT so it is simply solved by first compiling and then running the binary. And compile times for template heavy C++ code are pretty bad.
Please consider whether you really have the depth of knowledge to lay out the situation here. Things like claiming a package is abandoned without knowing it is highly counterproductive (you claim PackageCompilerX is abandoned, which it definitely isn’t, the kc/wip branch was last committed to 15 hours ago - on the contrary it represents one of our best shots at real progress here). And concluding comments such as “I am sceptical of this approach” and “But I think we can’t gain much from this method” adds no new information and does not help the discussion.
No, he is correct, I did claim that"I think PackageCompilerX is abandoned". I am really sorry for that. I hope my claim doesn’t mislead anyone.
I was checking the latest news, I saw this video for new AMD CPU with 32 cores:
They compare the time that is cut from compiling of Android, Chromium, and a game engine by adding more Cores.
@ChrisRackauckas pointed that Julia’s compiler is serial now, but we can use ORCJIT features of LLVM to parallelize that.
I guess this should be considered in the road map
Also, talked about here: New LLVM JIT Features
Actual LLVM compile time is not the major issue for many packages, rather type inference or other issues with actually generating the LLVM code. So multithreaded LLVM compiler is unlikely to help.
Not just the LLVM.
It might seem impossible now, but I feel that even code inference can be done in parallel for different functions (I don’t know the details of implantation).
Well IIRC Plots LLVM time is like 3 seconds. Leon’s latest work gets us down to 8.2 seconds. Our goal is getting to 5. When it was 26 seconds LLVM time didn’t matter, but now it’s something that’s interesting again. But even then, it’s still not the majority of the time, and as the total amount of compilation is cut down, this is cut down automatically as well, so it may still not be a major factor but always a small gain that could be had.
Fixing this issue will be a multi-prong approach:
- Better precompilation of packages
- Static compilation of things when necessary?
- Less method invalidation
- A “grown sysimage” for users that pools precompilation of their most used packages (this a vast simplification of something from recent in-person discussions)
- The compiler giving up earlier on things that are not inferrable anyways
- Making the interpreter be used in cases where things already aren’t inferred (this “give up” case)
- Multithreading the compilation so that way long compiles are handled better (large functions have O(n^2) compilation behavior in LLVM IIRC)
etc. The nice thing is that there have been like 20 things identified that could help. It’s just about trying a bunch of things, and in conglomeration the issue will be completed. You won’t see a single silver bullet take a standard package from 28 seconds to 1 second compile times (other than maybe static compilation), but as these approaches become available you’ll see each one take a slice out of it incrementally.
Later on please consider writing up a summary of various approaches and the speedup they gave, possibly as a blog post. I think it would be very instructive to other developers.
I think Leon’s work so far was mostly to come up with a set of tasks for Plots to execute and then run SnoopCompile. In other words, this magic can be widely copied.
Indeed, but it was coupled with some ways to instrument the compiler and find out precisely what’s taking the most time to compile, and some dives into what’s causing times to grow and discussions of method invalidation. It would be good to write this down, but we’re really just getting started.