.== performance regression


Apologies if this has already been addressed:

In 0.5.2
@time [1,2,3] .== 2
0.000013 seconds (9 allocations: 4.594 KB)

In 0.6.0
0.028732 seconds (6.24 k allocations: 344.770 KiB)

I know 0.6.0 changed the way the ‘dot’ works, but this performance regression is big. Am I missing something obvious?

julia> using BenchmarkTools

julia> @btime $[1,2,3] .== 2
  810.962 ns (4 allocations: 4.33 KiB)

but in 0.5:

julia> @btime $[1,2,3] .== 2
  2.091 μs (4 allocations: 4.33 KiB)

When working at the REPL, you have to be careful with what you’re profiling. Your @time is probably measuring mostly just the dispatch time, and the time to create the vector. @btime is generally much more accurate. The dollar sign tells it to evaluate [1,2,3] before starting to measure the time.


rookie mistake, thanks!


It is still weird @time [1,2,3] .== 2 results in 6.24 k allocations.


Just compilation


I don’t think so:

julia> @time [1,2,3] .== 2
  1.098644 seconds (126.88 k allocations: 6.651 MiB)
3-element BitArray{1}:

julia> @time [1,2,3] .== 2
  0.022890 seconds (6.25 k allocations: 344.973 KiB)
3-element BitArray{1}:

julia> @time [1,2,3] .== 2
  0.033313 seconds (6.25 k allocations: 345.020 KiB)
3-element BitArray{1}:

Only the first expression needs compilation.


No they all need compilation due to the anonymous function generated by the dot syntax.


Could you expand a little bit?

So dot syntax will generate anomalous function which will be compiled every time? This seems to be a little unacceptable for performance considerations.


All Julia functions will compile the first time. No anomalies there :slight_smile:. The REPL scope is dynamic so it compiles before running each time. That’s why it’s a performance tip to write functions.



Thank you. This is indeed true:

julia> function test()
                  [1, 2, 3] .== 2
test (generic function with 1 method)

julia> @time test()
  0.138285 seconds (112.70 k allocations: 5.578 MiB)

julia> @time test()
  0.000008 seconds (9 allocations: 4.594 KiB)



but no. It needs recompilation for every appearance in the code, same as everything else so repeated running of the same source code location (not same source code text) does not need recompilation.
It might be an improvement if the compiler can recognize and cache result that can be reused but it’s unrelated to the runtime performance.