How does constant propagation impact latency with a Val-based dispatch?

The example here is schematic, and isn’t really an MWE.

Suppose I have a set of functions like

f() = f('A')
Base.@constprop :aggressive f(a, args...) = g(Val(a == 'A'), args...)
Base.@constprop :aggressive g(::Val{true}, args...) = 1
Base.@constprop :aggressive g(::Val{false}, args...) = 2

In this case, code_typed for f() detects that g(::Val{false}) is unreachable, and inlines g(::Val{true}), to produce

julia> @code_typed f()
CodeInfo(
1 ─     return 1
) => Int64

However, the first call still seems to propagate constants into g(::Val{false}). In my use case, propagating constants into g(::Val{false}) is necessary for type-inference within the method, but this adds quite a bit to the latency of f(). I would love it if the method was entirely ignored in this call, as it is not being compiled subsequently.

Simplifying the method definition of g(::Val{false}, args...) appears to reduce TTFX for f(), which I had not expected, as this method is not being used subsequently in the call.

The actual use case that I’m looking at is in matmul, where the latency in generic_matmul!

may be reduced considerably if the last call

on line 417 were absent. This line is not reachable in many common cases, but still contributes to the latency of matrix multiplication. The constant-propagation into wrap appears to lead to this latency, and removing the aggressive constprop annotations reduces TTFX. Ideally, I would only want this constprop to happen if the line is reachable.

The connection with a Val-based dispatch is that I was trying to hide this line behind a Val(all(map(in(('N', 'T', 'C')), (tA_uc, tB_uc))), which may be evaluated at compile-time, but doesn’t seem to impact latency.