I wonder why there are significant differences in performance between these two functions. The core functions are the `tt`

and `tt1`

, one in which I use `@.`

to broadcast the results, and the other where I write everything as scalars and use `tt1.`

to call it.

Interestingly, the number of allocations and size is the same.

This speed difference does not happen if I only have one vector to `Ref`

(ie, if the functions don’t have `Z`

as argument). In my real application, `X`

and `Z`

are custom structs with many fields, so breaking it down in singletons would not be practical.

```
using BenchmarkTools
function test(X,Z, Y)
ret = Array{Float64,2}(undef,length(Y), 1000)
for k = 1:1000
ret[:,k] = tt.(Ref(X),Ref(Z),Y)
end
return ret
end
function test1(X,Z, Y)
ret = Array{Float64,2}(undef, length(Y), 1000)
for k = 1:1000
ret[:,k] = tt1(X,Z,Y)
end
return ret
end
function tt(X, Z, Y)
exp(X[1]*2 + X[1] ^X[2] + Z[1]^Z[2]) + Y
end
function tt1(X, Z, Y)
@. exp(X[1]*2 + X[1] ^X[2] + Z[1]^Z[2]) + Y
end
Y=rand(10000);
@btime test([2.0, 5.0],[5.0, 10.0],Y); #2.137 s (2004 allocations: 152.66 MiB)
@btime test1([2.0, 5.0],[5.0, 10.0],Y); #28.084 ms (2004 allocations: 152.66 MiB)
```