The improvements in Julia v1.5.1, tools like SnoopCompile.jl, and the recent posts by Tim Holy have motivated me to see if I can extract further performance from my plotting package Gaston. Being fairly new to this kind of optimization work in Julia, and since the tools are not always trivial to use, I’d like to ask those with more experience advice on how to get started, and where am I likely to find the greatest gains.
Currently I have these timings in Julia 1.5.1. I’m using Base.Experimental.@optlevel 1
and __precompile__(true)
.
$ julia
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.5.1 (2020-08-25)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> @time using Gaston
2.533726 seconds (9.71 M allocations: 540.790 MiB, 9.06% gc time)
julia> @time plot(1:10)
0.415968 seconds (151.52 k allocations: 7.849 MiB, 1.98% gc time)
julia> @time plot(1:10)
0.044536 seconds (16.50 k allocations: 822.115 KiB)
julia> @time plot(1:10)
0.000114 seconds (108 allocations: 7.953 KiB)
julia> @time plot(1:10)
0.000142 seconds (108 allocations: 7.953 KiB)
julia> @time plot(1.1:1.1:9.9)
0.155749 seconds (39.43 k allocations: 2.102 MiB)
julia> @time plot(1.1:1.1:9.9)
0.000131 seconds (134 allocations: 15.688 KiB)
Some thoughts:
- I’ve tried with different values of
optlevel
and it doesn’t seem to make much difference. - The time to load the package seems high to me, considering that it has less than 1400 lines and the code is (I think) fairly straightforward. The number of allocations also seems too large to me.
- The time and allocations required by the first
plot
tell me that the code was not precompiled in a useful way. - I don’t understand why the second
plot
still takes longer and requires many allocations, and why things settle down at the thirdplot
. (I’m super happy with the microsecond timings, though ) - Changing the arguments to float triggers a recomplilation, again with a huge amount of allocations.
- In the case of float, the timing settles down to ~100us with the second
plot
(instead of the third).
I’d appreciate any advice and pointers.