Julia function performance behaving strange when combining broadcasting, a NamedTuple of Parameters, and a function as argument

I have the following simplified example taken from real code that calculates a function over a difference for a vector of inputs and a function input

params = (a=2,b=3) #
calc(p::NamedTuple, t) = p.a*t^2+p.b*t^3 # Example function


testbroad1(p::NamedTuple, vec::Vector, calc::Function, t) = calc.((p,), t .- vec)

function testbroad2(params::NamedTuple, vec::Vector, calc::Function, t)
	c(t) = calc(params, t)

When testbroad1 and testbroad2 are benachmarked, the following happens

julia> @btime testbroad1($params, $vec, $calc, 5.0)
  1.057 μs (5 allocations: 8.03 KiB)

julia> @btime testbroad2($params, $vec, $calc, 5.0)
  922.944 ns (1 allocation: 7.94 KiB)

testbroad2 is faster by around 10%. In my more complex real world code the testbroad1 takes the double time to finish (not benchmarking but around 8 seconds instead of 4)

  1. Is that behavior expected or not?
  2. What is the best way to achieve the above? It is absolutely essential for me that params and the function calc are passed as parameters to testbroad and the performance regression is not acceptable. Finding (the rather trivial) workaround has cost me couple of hours

I don’t fully understand the rules for this situation, but I know that Julia may not fully specialize on ::Function arguments, in order to avoid potentially expensive recompilation (since every single function is a different type). Forcing Julia to specialize on that argument fixes the issue:

testbroad3(p::NamedTuple, vec::Vector, calc::F, t) where {F <: Function} = calc.((p,), t .- vec)
julia> @btime testbroad1($params, $vec, $calc, 5.0);
  1.038 μs (5 allocations: 8.03 KiB)

julia> @btime testbroad2($params, $vec, $calc, 5.0);
  899.103 ns (1 allocation: 7.94 KiB)

julia> @btime testbroad3($params, $vec, $calc, 5.0);
  883.857 ns (1 allocation: 7.94 KiB)

thanks! That even kinda makes sense