"@." can induce poor performance sometimes


#1

@. macro, it will put a . behind some user defined function even though it is applited on a scalar and there is no available bmask..

Interestingly, it causes a lot of memory allocation, inducing a poor performance.

julia> bss = collect(0:1<<16-1)
julia> @benchmark @. (bss & bmask(7)) << 4
BenchmarkTools.Trial: 
  memory estimate:  2.98 MiB
  allocs estimate:  162832
  --------------
  minimum time:     13.145 ms (0.00% GC)
  median time:      13.418 ms (0.00% GC)
  mean time:        13.716 ms (1.65% GC)
  maximum time:     53.083 ms (73.50% GC)
  --------------
  samples:          365
  evals/sample:     1

julia> @benchmark (bss .& bmask(7)) .<< 4
BenchmarkTools.Trial: 
  memory estimate:  512.17 KiB
  allocs estimate:  6
  --------------
  minimum time:     62.952 μs (0.00% GC)
  median time:      75.028 μs (0.00% GC)
  mean time:        95.000 μs (13.37% GC)
  maximum time:     43.102 ms (99.42% GC)
  --------------
  samples:          10000
  evals/sample:     1

why?


#2

Firstly, can you supply the bmask function? I cannot find it anywhere.

Secondly, remember to always interpolate global variables with $ when using BenchmarkTools, like this:

@benchmark @. ($bss & bmask(7)) << 4

The difference here should be between bmask(7) and bmask.(7), so you should just benchmark those two.


#3

One more thing: you should almost never collect ranges like this. It wastes memory, is mostly slower, and is unnecessarily verbose. Just use

julia> bss = 0:1<<16-1

#4

This is just part of the broadcasting specification. All functions can be “dotted” and all dotted functions participate in the fusion. That means that Julia will only loop once over the output and will call all the dotted functions on the elements. This is even handy at times – for example you can use A .+ rand.() and the random addition will be different for each element.

Note that you can prevent a particular function from participating in the @. fusion with a $.


#5

Thanks,
1).

DInt = Int64
bmask(ibit::Int) = one(DInt) << (ibit-1)

Remark: if I do not alias Int64 as DInt, using bmask. is OK, else, using bmask. causes allocation while bmask witthout dot has no allocation.

2).

julia> @benchmark ($bss .& bmask.(7)) .<< 4
BenchmarkTools.Trial: 
  memory estimate:  2.49 MiB
  allocs estimate:  130580
  --------------
  minimum time:     20.303 ms (0.00% GC)
  median time:      20.535 ms (0.00% GC)
  mean time:        20.986 ms (0.48% GC)
  maximum time:     29.362 ms (0.00% GC)
  --------------
  samples:          239
  evals/sample:     1

3). your suggestion on not using collect really helps a lot.


#6

You also want const DInt = Int64, see https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1.

julia> const DInt = Int64
Int64

julia> bmask(ibit::Int) = one(DInt) << (ibit-1)
bmask (generic function with 1 method)

julia> bss = 0:1<<16-1
0:65535

julia> using BenchmarkTools

julia> @benchmark ($bss .& bmask.(7)) .<< 4
BenchmarkTools.Trial: 
  memory estimate:  512.08 KiB
  allocs estimate:  2
  --------------
  minimum time:     44.799 μs (0.00% GC)
  median time:      260.575 μs (0.00% GC)
  mean time:        287.050 μs (13.04% GC)
  maximum time:     3.557 ms (85.02% GC)
  --------------
  samples:          10000
  evals/sample:     1

#7

You are right, now their performances are close.


#8

Wow, that’s cool!

julia> z = zeros(1,5);

julia> @. z + rand()
1×5 Array{Float64,2}:
 0.838248  0.40499  0.246746  0.979163  0.194097

julia> @. z + $rand()
1×5 Array{Float64,2}:
 0.142921  0.142921  0.142921  0.142921  0.142921

Makes sense, but I never thought about interpolating into a dot broadcast before.