This is type-unstable. Your s
starts as an integer but ends up as a floating-point value (for floating-point arrays). (See the performance tips.)
If you fix this, it is as fast as sum
(actually slightly faster because it avoids the slight overhead of pairwise summation):
julia> function mysum(a)
s = zero(eltype(a))
@simd for i in eachindex(a)
s += a[i]
end
return s
end
mysum (generic function with 1 method)
julia> a = rand(1000); @btime sum($a); @btime mysum($a);
97.765 ns (0 allocations: 0 bytes)
93.361 ns (0 allocations: 0 bytes)
(The @inbounds
is inferred here, though it wouldn’t hurt to add it.)
It probably won’t help if you have the s > a
check in every loop iteration, but it will help if you do the loop in chunks.