If this level of overhead on an inner loop isn’t good enough, fundamentally what you need to do is push the information about the operators into the type system on a more-outer loop, where you dont care about the overhead. E.g. here’s a version where the operator which is selected is fully known to the compiler because binops is a Tuple{typeof(+),typeof(-),typeof(*)} rather than a Vector{Function}, and because the index is now inside a Val:
binops = (+, -, *)
function listBINOP!(x, y, ::Val{i}, clen, binops) where {i}
op = binops[i]
broadcast!(op, x, x, y)
end
@benchmark listBINOP!($x, $x, Val(1), 50, $binary_operators)
This is exactly as fast as the if-version on my machine, I suspect all of the op selection stuff gets compiled away entirely. More-or-less, you’ll need to do something like this on a more-outer loop. For this problem, it sounds like anything you could do with meta-programming you could do more cleanly by just giving the Julia compiler the type information it needs, so I’d definitely recommend that.