New to Julia from R and so far very disappointed with the interactive experience. I get that compiles will take time, but it seems like everything is slow. I don’t get why something like “using mycoolpackage” or “add anothercoolpackage” or “iris = dataset(“datasets”, “iris”)” or plot(iris, x=:Species, y=:PetalLength, Geom.boxplot) takes so darn long. When I came to Julia for speed and performance, I don’t just want that after the code has been compiled. I want high performance across the board. Besides the irritation factor, it would take substantially longer to develop data products and analysis in Julia given the slow/sluggish development experience.
Is there anything I can do to make the REPL/interactive performance better? It’s just atrocious in every IDE I’ve tried: Juno/JuliaPro, VS Code, and Jupyter Labs.
It’s usually called the “time to first plot” problem, it is being worked on but there is no time-line to when this will be better. I agree that it is annoying, in fact the most annoying bit of Julia in my opinion. Nonetheless, the pros of Julia far outweigh this flaw, IMO.
There is no solution yet, but two things which help:
above-mentioned Revise.jl reduces the number of times you have to restart the REPL and pay the first-compilation time. This package works very reliably and should, IMO, be installed & activated by everyone.
there is PackageCompiler.jl which can fully compile packages. This is more experimental and YMMV.
The reason Julia is fast is because it is compiled, and compilation does take time.
Indeed, as @mauro3 says, this is one of the most common pain points of Julia, particularly coming from other languages that either hand off your computation to compiled libraries or just do everything in an interpreted way.
In Julia that first compilation is generating the optimized code for the actual computation to fly. If your computation is very quick already or you are doing some simple scripting or interactive exploration, it is painful, since we’re talking about adding from 0.1s to a few seconds to most commands. If you are doing a serious computation that takes more than a few seconds, the compilation time is peanuts. So that’s the downside of the Julia model at the moment. (The upsides are more than just speed, but take a bit to fully understand). Once we can fully cache compilations between sessions (i.e. essentially the aim of PackageCompiler.jl) the downside will likely disappear.
Apart from the tips above, note that you can also “deactivate” most of the Julia compilation magic in any given session by launching it with julia --compile=min. It will not run very fast (it will be essentially like interpreted python I think), but it will probably be more snappy. Check it out and let us know how you find it! I never use this because I’ve gotten used to Revise, which is really fantastic (although it has its own set of drawbacks still).
As mentioned, use Revise and leave open your Julia session as long as possible. Then, it is fast for most common commands and compilation latency is only present when you develop your own functions.
While it won’t give you R+ggplot2 speeds, the combo of VegaLite.jl and QueryTables (careful, that package is not yet released, so things might change between now and whenever I release it!) can give you much better times than some of the other julia packages in this space:
I completely agree! BUT, I just don’t get PackageCompiler. The documentation is too sparse for me, I’m reading that README and I just don’t know what I’m supposed to do to try things out.
Lets say I have PackageCompiler in my main default env. Then lets say I have a custom env, I activated it and added VegaLite#master to it. I guess then I run compile_incremental(:VegaLite)? And what do I do then? And how do I precompile all the packages for a custom env? I suppose I should call compile_incremental(toml_path::String, snoopfile::String), but where do I get a snoopfile from?
I think the “force” might swap out your current system image. Possibly a bad idea, but it also makes life easier for tool integration since the command line arguments are no longer necessary
Allow me to promote Fezzik once more. It uses PackageCompiler as backend and it is intended to solve the ”time for first anything” problem, by building a sysimg based on your actions.
And since we are measuring plot times, here is PGFPlotsX:
julia> @time begin
using PGFPlotsX; p = Plot(Coordinates([1,2], [2,3])); display(p)
end
4.680410 seconds (7.18 M allocations: 364.789 MiB, 5.14% gc time)
Good catch! That was counter-intuitive. BTW, those are some good numbers for PGFPlotsX.
julia> @time begin
using Gaston;p=plot([1,2],[2,3]);printfigure(outputfile="t.pdf")
end
4.993548 seconds (7.86 M allocations: 390.720 MiB, 3.03% gc time)
This time includes both displaying the plot and saving it as PDF. Without saving:
julia> @time begin
using Gaston;p=plot([1,2], [2,3]);
end
2.909761 seconds (3.10 M allocations: 155.544 MiB, 1.31% gc time)
Alright, if we are comparing things without loading a table package, this is what I get for VegaLite.jl (no custom sysimage):
julia> @time using VegaLite
2.549538 seconds (4.57 M allocations: 248.798 MiB, 4.68% gc time)
julia> @time using VegaLite
0.831583 seconds (2.26 M allocations: 108.544 MiB, 4.28% gc time)
I’m pretty optimistic that I can get the using part down to something like 1.6 seconds, I played with a branch yesterday where I got to something like that.
BUT, I won’t be even close to Gaston for saving as pdf, I think that is where things get much slower on my end (without having it really measured).