1115
May 26, 2018, 5:25pm
1
@.
macro, it will put a .
behind some user defined function even though it is applited on a scalar and there is no available bmask.
.
Interestingly, it causes a lot of memory allocation, inducing a poor performance.
julia> bss = collect(0:1<<16-1)
julia> @benchmark @. (bss & bmask(7)) << 4
BenchmarkTools.Trial:
memory estimate: 2.98 MiB
allocs estimate: 162832
--------------
minimum time: 13.145 ms (0.00% GC)
median time: 13.418 ms (0.00% GC)
mean time: 13.716 ms (1.65% GC)
maximum time: 53.083 ms (73.50% GC)
--------------
samples: 365
evals/sample: 1
julia> @benchmark (bss .& bmask(7)) .<< 4
BenchmarkTools.Trial:
memory estimate: 512.17 KiB
allocs estimate: 6
--------------
minimum time: 62.952 μs (0.00% GC)
median time: 75.028 μs (0.00% GC)
mean time: 95.000 μs (13.37% GC)
maximum time: 43.102 ms (99.42% GC)
--------------
samples: 10000
evals/sample: 1
why?
DNF
May 26, 2018, 5:39pm
2
Firstly, can you supply the bmask
function? I cannot find it anywhere.
Secondly, remember to always interpolate global variables with $
when using BenchmarkTools, like this:
@benchmark @. ($bss & bmask(7)) << 4
The difference here should be between bmask(7)
and bmask.(7)
, so you should just benchmark those two.
DNF
May 26, 2018, 5:49pm
3
One more thing: you should almost never collect ranges like this. It wastes memory, is mostly slower, and is unnecessarily verbose. Just use
julia> bss = 0:1<<16-1
1 Like
This is just part of the broadcasting specification. All functions can be “dotted” and all dotted functions participate in the fusion. That means that Julia will only loop once over the output and will call all the dotted functions on the elements. This is even handy at times – for example you can use A .+ rand.()
and the random addition will be different for each element.
Note that you can prevent a particular function from participating in the @.
fusion with a $
.
2 Likes
1115
May 26, 2018, 7:48pm
5
Thanks,
1).
DInt = Int64
bmask(ibit::Int) = one(DInt) << (ibit-1)
Remark : if I do not alias Int64
as DInt
, using bmask.
is OK, else, using bmask.
causes allocation while bmask
witthout dot has no allocation.
2).
julia> @benchmark ($bss .& bmask.(7)) .<< 4
BenchmarkTools.Trial:
memory estimate: 2.49 MiB
allocs estimate: 130580
--------------
minimum time: 20.303 ms (0.00% GC)
median time: 20.535 ms (0.00% GC)
mean time: 20.986 ms (0.48% GC)
maximum time: 29.362 ms (0.00% GC)
--------------
samples: 239
evals/sample: 1
3). your suggestion on not using collect
really helps a lot.
You also want const DInt = Int64
, see https://docs.julialang.org/en/stable/manual/performance-tips/#Avoid-global-variables-1 .
julia> const DInt = Int64
Int64
julia> bmask(ibit::Int) = one(DInt) << (ibit-1)
bmask (generic function with 1 method)
julia> bss = 0:1<<16-1
0:65535
julia> using BenchmarkTools
julia> @benchmark ($bss .& bmask.(7)) .<< 4
BenchmarkTools.Trial:
memory estimate: 512.08 KiB
allocs estimate: 2
--------------
minimum time: 44.799 μs (0.00% GC)
median time: 260.575 μs (0.00% GC)
mean time: 287.050 μs (13.04% GC)
maximum time: 3.557 ms (85.02% GC)
--------------
samples: 10000
evals/sample: 1
1 Like
1115
May 26, 2018, 8:27pm
7
You are right, now their performances are close.
Elrod
May 26, 2018, 9:47pm
8
Wow, that’s cool!
julia> z = zeros(1,5);
julia> @. z + rand()
1×5 Array{Float64,2}:
0.838248 0.40499 0.246746 0.979163 0.194097
julia> @. z + $rand()
1×5 Array{Float64,2}:
0.142921 0.142921 0.142921 0.142921 0.142921
Makes sense, but I never thought about interpolating into a dot broadcast before.
1 Like