I have a performance issue I’d like to solve. I need to reference some function arguments from a Vector{Any} array. Most of the time this is incurring an expensive allocation penalty. However, when I call an explicitly typed function, the allocation doesn’t happen, and the code runs much faster. I’ve written a MWE to demonstrate the issue. My problem is that unfortunately I cannot specify the input types to the degree required to save the allocation, so I’m wondering why it happens in some cases, and whether it can be avoided.
Here’s the MWE, and the output it produces:
using BenchmarkTools
struct TypeA{T<:Real}
a::T
end
struct TypeB{T<:Real}
a::T
b::T
end
function myfun1(a, b)
return a.a * b.a + b.b
end
function myfun2(a::TypeA, b::TypeB)
return a.a * b.a + b.b
end
function myfun3(a::TypeA{T}, b::TypeB{T}) where T <: Real
return a.a * b.a + b.b
end
function myfun4(a::TypeA{Float64}, b::TypeB{Float64})
return a.a * b.a + b.b
end
a = TypeA(1.)
b = TypeB(1.,2.)
v = [a, b]
@btime myfun1($a, $b)
@btime myfun1($v[1], $v[2])
@btime myfun2($v[1], $v[2])
@btime myfun3($v[1], $v[2])
@btime myfun4($v[1], $v[2])
# Added after post from mikmoore:
@btime myfun1($v[1]::TypeA, $v[2]::TypeB)
@btime myfun1($v[1]::TypeA{Float64}, $v[2]::TypeB{Float64})
julia> 2.125 ns (0 allocations: 0 bytes)
julia> 17.869 ns (1 allocation: 16 bytes)
julia> 19.079 ns (1 allocation: 16 bytes)
julia> 18.120 ns (1 allocation: 16 bytes)
julia> 3.333 ns (0 allocations: 0 bytes)
julia> 18.788 ns (1 allocation: 16 bytes)
julia> 2.416 ns (0 allocations: 0 bytes)
What’s also strange is that in my actual use case, parameterizing similar to myfun3 above is sufficient to avoid the allocations. Nevertheless, this is too constraining for my use case.