I wonder why there are significant differences in performance between these two functions. The core functions are the tt
and tt1
, one in which I use @.
to broadcast the results, and the other where I write everything as scalars and use tt1.
to call it.
Interestingly, the number of allocations and size is the same.
This speed difference does not happen if I only have one vector to Ref
(ie, if the functions don’t have Z
as argument). In my real application, X
and Z
are custom structs with many fields, so breaking it down in singletons would not be practical.
using BenchmarkTools
function test(X,Z, Y)
ret = Array{Float64,2}(undef,length(Y), 1000)
for k = 1:1000
ret[:,k] = tt.(Ref(X),Ref(Z),Y)
end
return ret
end
function test1(X,Z, Y)
ret = Array{Float64,2}(undef, length(Y), 1000)
for k = 1:1000
ret[:,k] = tt1(X,Z,Y)
end
return ret
end
function tt(X, Z, Y)
exp(X[1]*2 + X[1] ^X[2] + Z[1]^Z[2]) + Y
end
function tt1(X, Z, Y)
@. exp(X[1]*2 + X[1] ^X[2] + Z[1]^Z[2]) + Y
end
Y=rand(10000);
@btime test([2.0, 5.0],[5.0, 10.0],Y); #2.137 s (2004 allocations: 152.66 MiB)
@btime test1([2.0, 5.0],[5.0, 10.0],Y); #28.084 ms (2004 allocations: 152.66 MiB)