Ways to make slow/sluggish REPL/interactive development experience faster?

New to Julia from R and so far very disappointed with the interactive experience. I get that compiles will take time, but it seems like everything is slow. I don’t get why something like “using mycoolpackage” or “add anothercoolpackage” or “iris = dataset(“datasets”, “iris”)” or plot(iris, x=:Species, y=:PetalLength, Geom.boxplot) takes so darn long. When I came to Julia for speed and performance, I don’t just want that after the code has been compiled. I want high performance across the board. Besides the irritation factor, it would take substantially longer to develop data products and analysis in Julia given the slow/sluggish development experience.

Is there anything I can do to make the REPL/interactive performance better? It’s just atrocious in every IDE I’ve tried: Juno/JuliaPro, VS Code, and Jupyter Labs.

Please help!

1 Like

To a certain extent the issue is known. OTOH it is hard to quantify what “so darn long” means for you.

Please tell us what version of Julia you are using (1.1 is much better than 1.0), and some examples and their timings.

Also, are you aware of


?

4 Likes

It’s usually called the “time to first plot” problem, it is being worked on but there is no time-line to when this will be better. I agree that it is annoying, in fact the most annoying bit of Julia in my opinion. Nonetheless, the pros of Julia far outweigh this flaw, IMO.

There is no solution yet, but two things which help:

  • above-mentioned Revise.jl reduces the number of times you have to restart the REPL and pay the first-compilation time. This package works very reliably and should, IMO, be installed & activated by everyone.
  • there is PackageCompiler.jl which can fully compile packages. This is more experimental and YMMV.

The reason Julia is fast is because it is compiled, and compilation does take time.

6 Likes

Indeed, as @mauro3 says, this is one of the most common pain points of Julia, particularly coming from other languages that either hand off your computation to compiled libraries or just do everything in an interpreted way.

In Julia that first compilation is generating the optimized code for the actual computation to fly. If your computation is very quick already or you are doing some simple scripting or interactive exploration, it is painful, since we’re talking about adding from 0.1s to a few seconds to most commands. If you are doing a serious computation that takes more than a few seconds, the compilation time is peanuts. So that’s the downside of the Julia model at the moment. (The upsides are more than just speed, but take a bit to fully understand). Once we can fully cache compilations between sessions (i.e. essentially the aim of PackageCompiler.jl) the downside will likely disappear.

Apart from the tips above, note that you can also “deactivate” most of the Julia compilation magic in any given session by launching it with julia --compile=min. It will not run very fast (it will be essentially like interpreted python I think), but it will probably be more snappy. Check it out and let us know how you find it! I never use this because I’ve gotten used to Revise, which is really fantastic (although it has its own set of drawbacks still).

PS: see also Fully Interpreted Julia
PS2: another related package to save compilation results: https://github.com/TsurHerman/Fezzik

5 Likes

As mentioned, use Revise and leave open your Julia session as long as possible. Then, it is fast for most common commands and compilation latency is only present when you develop your own functions.

1 Like

Wow, just tried R+ggplot2. I had forgotten how bad this time to first plot was in Julia!

1 Like

While it won’t give you R+ggplot2 speeds, the combo of VegaLite.jl and QueryTables (careful, that package is not yet released, so things might change between now and whenever I release it!) can give you much better times than some of the other julia packages in this space:

julia> @time using VegaLite, QueryTables
  4.059855 seconds (8.87 M allocations: 456.455 MiB, 5.33% gc time)

julia> @time display(DataTable(a=rand(100), b=rand(100)) |> @vlplot(:point, x=:a, y=:b))
  2.304952 seconds (9.57 M allocations: 460.422 MiB, 5.56% gc time)

vs Gadfly with DataFrames:

julia> @time using Gadfly, DataFrames
[ Info: Loading DataFrames support into Gadfly.jl
 11.335112 seconds (24.21 M allocations: 1.275 GiB, 4.80% gc time)

julia> @time display(plot(DataFrame(a=rand(100), b=rand(100)), x=:a, y=:b, Geom.point))
 28.536965 seconds (85.57 M allocations: 4.272 GiB, 7.75% gc time)

vs StatsPlots with DataFrames:

julia> @time using StatsPlots, DataFrames
 11.701202 seconds (28.08 M allocations: 1.502 GiB, 6.22% gc time) 

julia> @time display(@df DataFrame(a=rand(100), b=rand(100)) scatter(:a, :b))
 21.729051 seconds (64.91 M allocations: 3.176 GiB, 7.97% gc time)
1 Like

Sure would be great to have PackageCompiler up and running with those to speed it up even more!

Which, by the way, works well with Plots.jl and GR.

I completely agree! BUT, I just don’t get PackageCompiler. The documentation is too sparse for me, I’m reading that README and I just don’t know what I’m supposed to do to try things out.

Lets say I have PackageCompiler in my main default env. Then lets say I have a custom env, I activated it and added VegaLite#master to it. I guess then I run compile_incremental(:VegaLite)? And what do I do then? And how do I precompile all the packages for a custom env? I suppose I should call compile_incremental(toml_path::String, snoopfile::String), but where do I get a snoopfile from?

2 Likes

It sure is… Both sparse and full of noise for the simplest use cases. Check out this line in the docker for binder package compiling, which is pretty straightforward.https://github.com/arnavs/compiled-binder-example/blob/master/Dockerfile#L57

A variation on this is what @arnavsood implemented in https://github.com/jupyter/repo2docker/issues/686#issuecomment-494665724

Alright, I got PackageCompiler to work! Here are times for VegaLite and QueryTables with a custom sysimage:

julia> @time using VegaLite, QueryTables
  0.596350 seconds (936.60 k allocations: 45.327 MiB, 1.69% gc time)

julia> @time display(DataTable(a=rand(100), b=rand(100)) |> @vlplot(:point, x=:a, y=:b))
  0.579261 seconds (800.87 k allocations: 40.629 MiB, 2.03% gc time)

That seems fine to me :slight_smile: So now we just need to get PackageCompiler tech into base and we can call this a day.

Here are the steps I had to do to get to this point:

  1. PackageCompiler doesn’t seem to work with custom environments, so everything is in the default env.
  2. pkg> add VegaLite#master QueryTables PackageCompiler
  3. using PackageCompiler; new_sysimage_path, _ = compile_incremental(:VegaLite, :QueryTables)
  4. Note down the path that is stored in new_sysimage_path somewhere
  5. Start a new julia instance from the command line with julia -J path_to_new_sysimage where I substituted the path in

So, that is pretty cool! I am also lost why a simple example like this is not the first thing in the PackgeCompiler README…

6 Likes

Yeah, it is so easy to use… When it works. Speaking of which, the problems I had were on your canvas branch. But maybe that was just me

I think the “force” might swap out your current system image. Possibly a bad idea, but it also makes life easier for tool integration since the command line arguments are no longer necessary

Allow me to promote Fezzik once more. It uses PackageCompiler as backend and it is intended to solve the ”time for first anything” problem, by building a sysimg based on your actions.

6 Likes

(I tried using Fezzik, something tripped it up, repeatedly printing …Quadmath not (something)…)

I can’t beat your PackageCompiler numbers, but using Gaston#master:

julia> @time using Gaston; plot(rand(10000)); printfigure(outputfile="test.pdf")
  2.036927 seconds (2.15 M allocations: 106.887 MiB, 1.22% gc time)

This will be released as Gaston v0.10 as soon as I can finish updating the docs.

4 Likes

I think you are only measuring the using Gaston time.

julia> @time sleep(1); sleep(1)
  1.013370 seconds (4.78 k allocations: 254.120 KiB)

And since we are measuring plot times, here is PGFPlotsX:

julia> @time begin 
           using PGFPlotsX; p = Plot(Coordinates([1,2], [2,3])); display(p)
       end
  4.680410 seconds (7.18 M allocations: 364.789 MiB, 5.14% gc time)
3 Likes

Good catch! That was counter-intuitive. BTW, those are some good numbers for PGFPlotsX.

julia> @time begin
           using Gaston;p=plot([1,2],[2,3]);printfigure(outputfile="t.pdf")
       end
  4.993548 seconds (7.86 M allocations: 390.720 MiB, 3.03% gc time)

This time includes both displaying the plot and saving it as PDF. Without saving:

julia> @time begin
           using Gaston;p=plot([1,2], [2,3]);
       end
  2.909761 seconds (3.10 M allocations: 155.544 MiB, 1.31% gc time)
1 Like

Alright, if we are comparing things without loading a table package, this is what I get for VegaLite.jl (no custom sysimage):

julia> @time using VegaLite
  2.549538 seconds (4.57 M allocations: 248.798 MiB, 4.68% gc time)

julia> @time using VegaLite
  0.831583 seconds (2.26 M allocations: 108.544 MiB, 4.28% gc time)

I’m pretty optimistic that I can get the using part down to something like 1.6 seconds, I played with a branch yesterday where I got to something like that.

BUT, I won’t be even close to Gaston for saving as pdf, I think that is where things get much slower on my end (without having it really measured).

4 Likes

Yes. For GMT it errorsat the end with

┌ Info: activating new environment at C:\Users\j\.julia\packages\PackageCompiler\oT98U\packages\Project.toml.
└ @ Pkg.API C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Pkg\src\API.jl:524
  Updating registry at `C:\Users\j\.julia\registries\General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
fatal: error thrown and no exception handler available.
ErrorException("Task cannot be serialized")
...