Understanding allocs/constant propagation issue?

Egwene_al_Vere · January 4, 2023, 12:16am

Consider the example,

using LinearAlgebra, BenchmarkTools

function comu_f!(r, a, b, p, s)
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    return nothing
end

m1, m2, m3 = [rand(ComplexF64, 10, 10) for _ in 1:3]

function txx(r, a, b)
    p = false
    s = 2.0
    comu_f!(r, a, b, p, s)
end

function txx2(r, a, b)
    p = false
    s = 2.0
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    nothing
end

(In real use, p and s would change, and is in a inner loop so I’d like to reduce allocations.)

@btime txx($m1, $m2, $m3)
shows

757.120 ns (1 allocation: 32 bytes)

while @btime txx2($m1, $m2, $m3) shows

745.260 ns (0 allocations: 0 bytes)

Since txx2 just copy-paste the code, why does txx allocates while the mul! call should not? Thanks!

jmair · January 4, 2023, 12:36am

I don’t think the constants get propagated through to the function call. In general (someone correct me if I’m wrong), you only really can rely on type information to ensure proper constant propagation through functions. An example to fix the performance:

function comu_f2!(r,a,b,::Val{p},::Val{s}) where {p,s}
    mul!(r, a, b, s, p)
    mul!(r, b, a, -s, true)
    nothing
end

Using the Val lets you encode values into the type itself to be available at compile time.

You can change the function to

function txx3(r, a, b)                                       
    p = false
    s = 2.0
    comu_f2!(r, a, b, Val(p), Val(s))
end

For me, this gives basically the same performance as txx2.

I think the reason for doing this is to make sure that the comu_f2 function can be reused for the same types (i.e. be general) without having to recompile for different constants every time. I am sure there is a macro which lets you inline the function and have the constant propagation but I’m not sure.

lmiq · January 4, 2023, 12:44am

@inline function ...

May work, but constant propagation is not guaranteed in any case.

jmair · January 4, 2023, 12:49am

I did try this but it didn’t work with the constant propagation.

uniment · January 4, 2023, 1:30am

Try this?

Base.@constprop :aggressive function comu_f!(r, a, b, p, s) ...

fatteneder · January 4, 2023, 7:57am

I think this only helps if the dispatch values are compile time constants, like you said.

OP mentioned that in his real application p, s should change too. I think I read somewhere that this will then also involve dynamic dispatch (due to the type Val{p} being dynamic then) and, thus, will also gain overhead from that.
Or can const-prop eliminate that too?

jmair · January 4, 2023, 8:06am

This works great, I hadn’t seen this macro before - thanks!

Egwene_al_Vere · January 4, 2023, 2:30pm

Thanks! The Base.@constprop :aggressive macro seems to work, nice to know this exists!

Topic		Replies	Views
How does constant propagation impact latency with a Val-based dispatch? Performance compilation , constant-propagation	0	129	May 22, 2024
Allocations with closure and mul! Performance memory-allocation , closure	4	449	May 10, 2021
Memrory allocation General Usage question	7	554	November 18, 2018
Help me understand constant propagation General Usage	4	653	May 6, 2020
Push! versus preallocation New to Julia	17	2747	June 11, 2020

Understanding allocs/constant propagation issue?

Related topics