Defining function as scalar vs fusing with two Ref's, significant speed difference

Elrod · September 16, 2021, 6:26pm

LLVM should hoist X[1] ^ X[2] and Z[1] ^ Z[2] out of the broadcast loop in the test1 example, hence evaluating it only once per call to tt1.

Try marking tt with @inline to expose its contents to the compiler, using tuples instead of arrays for X and Y, and finally using broadcasted assignment .=:

julia> function test3(X,Z, Y)
           ret = Array{Float64,2}(undef,length(Y), 1000)
           for k = 1:1000
               ret[:,k] .= tt.(Ref(X),Ref(Z),Y)
           end
           return ret
       end
test3 (generic function with 1 method)

julia> @inline function tt(X, Z, Y)
           exp(X[1]*2 +  X[1] ^X[2] + Z[1]^Z[2]) + Y
       end
tt (generic function with 1 method)

julia> @btime test3((2.0, 5.0), (5.0, 10.0), $Y); #28.084 ms (2004 allocations: 152.66 MiB)
  56.652 ms (2 allocations: 76.29 MiB)

This is faster for me than test1:

julia> @btime test([2.0, 5.0],[5.0, 10.0],Y); #2.137 s (2004 allocations: 152.66 MiB)
  1.093 s (4004 allocations: 152.66 MiB)

julia> @btime test1([2.0, 5.0],[5.0, 10.0],Y); #28.084 ms (2004 allocations: 152.66 MiB)
  66.859 ms (2004 allocations: 152.63 MiB)

Of course, the exp part should also be possible to lift out of the loop, but it doesn’t.

Topic		Replies	Views
What's the "right" way to broadcast vector-valued functions? New to Julia broadcast , array	4	1642	November 6, 2019
Idiomatic evaluate for scalar, allocate and broadcast for array? General Usage broadcast , memory-allocation , dispatch	7	146	January 21, 2026
Broadcasting slower than for-loop New to Julia	6	484	December 13, 2023
Should scalar calculation in Broadcast be "lazy"? Internals & Design broadcast	12	988	August 10, 2020
Marking types as scalar for broadcasting, Ref vs. Tuple? General Usage	2	726	September 24, 2019

Defining function as scalar vs fusing with two Ref's, significant speed difference

Related topics