Avoiding the branch allows the compiler to use SIMD intrinsics that process multiple elements of the input arrays at the same time. The conventional syntax is something like y = ifelse(B[i], x[i]^2, 0.0), which will evaluate both of the branches simultaneously. For straight-line code like this, it’s also worth throwing an @inbounds on the loop or trying LoopVectorization:
function f4(x::Array{T,1},B) where {T}
n=length(x)
y=zero(T)
@avxt for i=1:n
y = ifelse(B[i], x[i]^2, zero(T))
end
return y
end
julia> @btime f4($x, $B)
369.307 ns (0 allocations: 0 bytes)
0.0