You should not use @simd
if your algorithm depends on how +
and *
and other floating-point operations are associated. Consider the difference between left-to-right summation and left-to-right summation with a @simd
annotation:
function mysum(v::Vector{Float64})
t = 0.0
for x in v
t += x
end
return t
end
function mysum_simd(v::Vector{Float64})
t = 0.0
@simd for x in v
t += x
end
return t
end
You already get slightly different results for most random arrays:
julia> v = rand(10^6);
julia> mysum(v)
499897.00784308364
julia> mysum_simd(v)
499897.00784311077
Well, that doesn’t seem terrible. How bad can it get? I addressed this question in this post:
In short, it can be as bad as possible—you can literally get arbitrarily wrong values. Here’s a minor tweak of the sumsto
function from that post (changing realmax
to floatmax
and reversing the order of the iteration of powers), which really shows how bad this is:
function sumsto(x::Float64)
0 <= x < exp2(970) || throw(ArgumentError("sum must be in [0,2^970)"))
n, p₀ = Base.decompose(x) # integers such that `n*exp2(p₀) == x`
[floatmax(); [exp2(p) for p in reverse(-1074:969) if iseven(n >> (p-p₀))]
-floatmax(); [exp2(p) for p in reverse(-1074:969) if isodd(n >> (p-p₀))]]
end
julia> s = rand()*10^6
629436.8129391546
julia> v = sumsto(s)
2046-element Array{Float64,1}:
1.7976931348623157e308
4.9896007738368e291
2.4948003869184e291
⋮
1.862645149230957e-9
4.656612873077393e-10
1.1641532182693481e-10
julia> mysum(v)
629436.8129391546
julia> mysum_simd(v)
6.652801031782399e291
julia> abs(mysum_simd(v)-mysum(v))/mysum(v)
1.056945017358796e286
Assuming that the result of left-to-right summation is the desired result, that’s a relative error of 10^286! Oops. Julia’s built-in sum
also gives an entirely different answer by reassociating the +
operations in a different way:
julia> sum(v)
0.0
Of course, there’s a strong chance that what you really wanted was the true sum, which is 9.9792015476736e291 (rounded to 64-bit precision). None of these algorithms give that, although mysum_simd
comes closest. Kahan summation does give the correct answer:
julia> using KahanSummation
julia> sum_kbn(v)
9.9792015476736e291
That comes at the cost of being about 2x slower than built-in sum
or mysum_simd
:
julia> @time mysum(v)
0.000008 seconds (5 allocations: 176 bytes)
629436.8129391546
julia> @time mysum_simd(v)
0.000005 seconds (5 allocations: 176 bytes)
6.652801031782399e291
julia> @time sum(v)
0.000005 seconds (5 allocations: 176 bytes)
0.0
julia> @time sum_kbn(v)
0.000007 seconds (5 allocations: 176 bytes)
9.9792015476736e291
Note, however, that Kahan summation is fully correct and still faster than non-SIMD naive summation, so it’s pretty impressive overall.