When shouldn't we use @simd?

You should not use @simd if your algorithm depends on how + and * and other floating-point operations are associated. Consider the difference between left-to-right summation and left-to-right summation with a @simd annotation:

function mysum(v::Vector{Float64})
    t = 0.0
    for x in v
        t += x
    end
    return t
end

function mysum_simd(v::Vector{Float64})
    t = 0.0
    @simd for x in v
        t += x
    end
    return t
end

You already get slightly different results for most random arrays:

julia> v = rand(10^6);

julia> mysum(v)
499897.00784308364

julia> mysum_simd(v)
499897.00784311077

Well, that doesn’t seem terrible. How bad can it get? I addressed this question in this post:

In short, it can be as bad as possible—you can literally get arbitrarily wrong values. Here’s a minor tweak of the sumsto function from that post (changing realmax to floatmax and reversing the order of the iteration of powers), which really shows how bad this is:

function sumsto(x::Float64)
    0 <= x < exp2(970) || throw(ArgumentError("sum must be in [0,2^970)"))
    n, p₀ = Base.decompose(x) # integers such that `n*exp2(p₀) == x`
    [floatmax(); [exp2(p) for p in reverse(-1074:969) if iseven(n >> (p-p₀))]
    -floatmax(); [exp2(p) for p in reverse(-1074:969) if isodd(n >> (p-p₀))]]
end
julia> s = rand()*10^6
629436.8129391546

julia> v = sumsto(s)
2046-element Array{Float64,1}:
   1.7976931348623157e308
   4.9896007738368e291
   2.4948003869184e291
   ⋮
   1.862645149230957e-9
   4.656612873077393e-10
   1.1641532182693481e-10

julia> mysum(v)
629436.8129391546

julia> mysum_simd(v)
6.652801031782399e291

julia> abs(mysum_simd(v)-mysum(v))/mysum(v)
1.056945017358796e286

Assuming that the result of left-to-right summation is the desired result, that’s a relative error of 10^286! Oops. Julia’s built-in sum also gives an entirely different answer by reassociating the + operations in a different way:

julia> sum(v)
0.0

Of course, there’s a strong chance that what you really wanted was the true sum, which is 9.9792015476736e291 (rounded to 64-bit precision). None of these algorithms give that, although mysum_simd comes closest. Kahan summation does give the correct answer:

julia> using KahanSummation

julia> sum_kbn(v)
9.9792015476736e291

That comes at the cost of being about 2x slower than built-in sum or mysum_simd:

julia> @time mysum(v)
  0.000008 seconds (5 allocations: 176 bytes)
629436.8129391546

julia> @time mysum_simd(v)
  0.000005 seconds (5 allocations: 176 bytes)
6.652801031782399e291

julia> @time sum(v)
  0.000005 seconds (5 allocations: 176 bytes)
0.0

julia> @time sum_kbn(v)
  0.000007 seconds (5 allocations: 176 bytes)
9.9792015476736e291

Note, however, that Kahan summation is fully correct and still faster than non-SIMD naive summation, so it’s pretty impressive overall.

18 Likes