Understanding allocs/constant propagation issue?

Consider the example,

using LinearAlgebra, BenchmarkTools

function comu_f!(r, a, b, p, s)
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    return nothing
end

m1, m2, m3 = [rand(ComplexF64, 10, 10) for _ in 1:3]

function txx(r, a, b)
    p = false
    s = 2.0
    comu_f!(r, a, b, p, s)
end

function txx2(r, a, b)
    p = false
    s = 2.0
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    nothing
end

(In real use, p and s would change, and is in a inner loop so I’d like to reduce allocations.)

@btime txx($m1, $m2, $m3)
shows

757.120 ns (1 allocation: 32 bytes)

while @btime txx2($m1, $m2, $m3) shows

745.260 ns (0 allocations: 0 bytes)

Since txx2 just copy-paste the code, why does txx allocates while the mul! call should not? Thanks!

I don’t think the constants get propagated through to the function call. In general (someone correct me if I’m wrong), you only really can rely on type information to ensure proper constant propagation through functions. An example to fix the performance:

function comu_f2!(r,a,b,::Val{p},::Val{s}) where {p,s}
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    nothing
end

Using the Val lets you encode values into the type itself to be available at compile time.

You can change the function to

function txx3(r, a, b)                                       
    p = false
    s = 2.0
    comu_f2!(r, a, b, Val(p), Val(s))
end

For me, this gives basically the same performance as txx2.

I think the reason for doing this is to make sure that the comu_f2 function can be reused for the same types (i.e. be general) without having to recompile for different constants every time. I am sure there is a macro which lets you inline the function and have the constant propagation but I’m not sure.

2 Likes
@inline function ...

May work, but constant propagation is not guaranteed in any case.

1 Like

I did try this but it didn’t work with the constant propagation.

1 Like

Try this?

Base.@constprop :aggressive function comu_f!(r, a, b, p, s) ...
4 Likes

I think this only helps if the dispatch values are compile time constants, like you said.

OP mentioned that in his real application p, s should change too. I think I read somewhere that this will then also involve dynamic dispatch (due to the type Val{p} being dynamic then) and, thus, will also gain overhead from that.
Or can const-prop eliminate that too?

1 Like

This works great, I hadn’t seen this macro before - thanks!

Thanks! The Base.@constprop :aggressive macro seems to work, nice to know this exists!