When shouldn't we use @simd?

StefanKarpinski · December 5, 2018, 4:33pm

You should not use @simd if your algorithm depends on how + and * and other floating-point operations are associated. Consider the difference between left-to-right summation and left-to-right summation with a @simd annotation:

function mysum(v::Vector{Float64})
    t = 0.0
    for x in v
        t += x
    end
    return t
end

function mysum_simd(v::Vector{Float64})
    t = 0.0
    @simd for x in v
        t += x
    end
    return t
end

You already get slightly different results for most random arrays:

julia> v = rand(10^6);

julia> mysum(v)
499897.00784308364

julia> mysum_simd(v)
499897.00784311077

Well, that doesn’t seem terrible. How bad can it get? I addressed this question in this post:

In short, it can be as bad as possible—you can literally get arbitrarily wrong values. Here’s a minor tweak of the sumsto function from that post (changing realmax to floatmax and reversing the order of the iteration of powers), which really shows how bad this is:

function sumsto(x::Float64)
    0 <= x < exp2(970) || throw(ArgumentError("sum must be in [0,2^970)"))
    n, p₀ = Base.decompose(x) # integers such that `n*exp2(p₀) == x`
    [floatmax(); [exp2(p) for p in reverse(-1074:969) if iseven(n >> (p-p₀))]
    -floatmax(); [exp2(p) for p in reverse(-1074:969) if isodd(n >> (p-p₀))]]
end

julia> s = rand()*10^6
629436.8129391546

julia> v = sumsto(s)
2046-element Array{Float64,1}:
   1.7976931348623157e308
   4.9896007738368e291
   2.4948003869184e291
   ⋮
   1.862645149230957e-9
   4.656612873077393e-10
   1.1641532182693481e-10

julia> mysum(v)
629436.8129391546

julia> mysum_simd(v)
6.652801031782399e291

julia> abs(mysum_simd(v)-mysum(v))/mysum(v)
1.056945017358796e286

Assuming that the result of left-to-right summation is the desired result, that’s a relative error of 10^286! Oops. Julia’s built-in sum also gives an entirely different answer by reassociating the + operations in a different way:

julia> sum(v)
0.0

Of course, there’s a strong chance that what you really wanted was the true sum, which is 9.9792015476736e291 (rounded to 64-bit precision). None of these algorithms give that, although mysum_simd comes closest. Kahan summation does give the correct answer:

julia> using KahanSummation

julia> sum_kbn(v)
9.9792015476736e291

That comes at the cost of being about 2x slower than built-in sum or mysum_simd:

julia> @time mysum(v)
  0.000008 seconds (5 allocations: 176 bytes)
629436.8129391546

julia> @time mysum_simd(v)
  0.000005 seconds (5 allocations: 176 bytes)
6.652801031782399e291

julia> @time sum(v)
  0.000005 seconds (5 allocations: 176 bytes)
0.0

julia> @time sum_kbn(v)
  0.000007 seconds (5 allocations: 176 bytes)
9.9792015476736e291

Note, however, that Kahan summation is fully correct and still faster than non-SIMD naive summation, so it’s pretty impressive overall.

Topic		Replies	Views
Is it ok to iterate inside a `@simd` loop? General Usage performance , simd	8	285	April 25, 2025
How to do SIMD code with wide-register accumulators (@simd vs LoopVectorization.jl vs SIMD.jl) Performance simd	11	2610	June 22, 2021
A simple SIMD.jl loop that is slower than a vanilla `@inbounds @simd` Performance simd	8	1887	June 27, 2021
Why is this @simd loop faster than a while loop even if it has longer assembly? Performance question , simd , loops	6	1622	August 1, 2021
Is the triple `@inbounds @fastmath @simd` necessary for absolute peak performance? Performance	7	493	October 21, 2024

When shouldn't we use @simd?

Related topics