When shouldn't we use @simd?

Juan · December 4, 2018, 11:28am

Hello.

@simd documentations say:

Warning:
Incorrect use of the @simd macro may cause unexpected results.

* You are asserting it is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables.
* Floating-point operations on reduction variables must be reordered, possibly causing different results than without @simd.
* The object iterated over should be a one-dimensional range. 
* The loop must be an innermost loop
* The loop body must be straight-line code. 
* Accesses must have a stride pattern and cannot be gathers (random-index reads) or "scatters" (random-index writes).
* The stride should be unit stride.
* There exists no loop-carried memory dependencies
* No iteration ever waits on a previous iteration to make forward progress.

Could you provide examples violating those restrictions or where the @simd macro would produce wrong results, please?

Tamas_Papp · December 4, 2018, 11:35am

This is difficult in general since there is no predictable guarantee of wrong results either. Just that they may be incorrect, so you should not depend on this.

That said, the above list is pretty clear, which one do you need an example for? Eg for the second, just consider a classic like

julia> 1 + eps()/2 - 1
0.0

julia> eps()/2 + (1 - 1)
1.1102230246251565e-16

Juan · December 4, 2018, 11:42am

But this is not related to @simd.
Anything more specific to @simd?

Tamas_Papp · December 4, 2018, 11:42am

I wonder if you read what I wrote, and what you quoted.

rdeits · December 4, 2018, 12:55pm

It just relates to different orders of operations giving different answers, and since @simd may result in your operations being reordered you might see similar effects.

bennedich · December 4, 2018, 12:56pm

Hmm, not sure about that specific example since there’s no reduction variable, and by putting (1 - 1) in parenthesis, the compiler will likely just optimize it away. I guess you could imagine something like this:

x = 0.0;
@simd for n = 1:10
  global x
  x += 1;
  x *= 2;
end
x

The answer should be 2046. But if the two operations on x are reordered, the answer will be 1023. So I wouldn’t consider this code @simd-safe. However, I’m not sure if it would occur in practice. I too would be curious to see actual examples of where @simd gives different results due to point 1 and 2 in the list above.

foobar_lv2 · December 4, 2018, 1:40pm

The classic example would be

julia> @noinline function g(a,b)
              @simd for i=1:length(a)
               @inbounds a[i] += b[i]
              end
              nothing
              end
julia> N=16;a0=ones(Int,N); a=view(a0, 2:N); b=view(a0,1:N-1); g(a,b); a0

I don’t understand why the simd-code yields the correct results here (it does on my machine). The simd annotation is wrong if I don’t personally check whether a and b alias. E.g.

julia> @noinline function g_unroll(a,b)
              for i=1:(length(a)>>1)
              a1 = a[2*i-1]
              a2 = a[2*i]
              b1 = b[2*i-1]
              b2 = b[2*i]
              a[2*i-1] = a1+b1
              a[2*i] = a2+b2
              end
              nothing
              end

is wrong (ie different from g). FWIW, a .+= b detects the aliasing and emulates un-aliased semantics (which is very different from g).

Postscript: I should have RTFM first:

!!! note
    The `@simd` does not assert by default that the loop is completely free of loop-carried
    memory dependencies, which is an assumption that can easily be violated in generic code.
    If you are writing non-generic code, you can use `@simd ivdep for ... end` to also assert that:
        * There exists no loop-carried memory dependencies
        * No iteration ever waits on a previous iteration to make forward progress.

And indeed, the following does yield the expected wrong result with N=128:

julia> @noinline function g(a,b)
              @simd ivdep for i=1:length(a)
               @inbounds a[i] += b[i]
              end
              nothing
              end

and PPS: OP quoted an old version of the docs. I think @simd was made much safer and less effective recently; something with bitarrays and @simd annotations in Base or Pkg that turned out to be unsound in the old model, leading to hilarious bugs and the insight that ivdep is a very strong assumption that is hard to use safely (e.g. @simd ivdep for i=1:length(a) a[i]=f(i) end is wrong for a::BitArray, because setindex! is a read-modify-write with adjacent 64 bits sharing an address, and hence a loop-carried memory dependency).

kristoffer.carlsson · December 4, 2018, 2:14pm

The SIMD macro will try to find reduction variables in the loop and apply fast mode for the reductions onto them. I don’t think this will kick-in for your case and the @simd will be a no-op. Instead, LLVM likely does run-time checks to see if SIMD is ok.

mbauman · December 4, 2018, 2:55pm

The current documentation for v1.0 (Essentials · The Julia Language) has separated out that list of bullet points into things that stymie vectorization and things that can go wrong. It should be much easier to understand.

StefanKarpinski · December 5, 2018, 3:49pm

That’s not the kind of reordering that is allowed—what is allowed is reassociation of mathematically associative operations which are not actually associative in floating point. This includes + and *. So (a + b) + c can be evaluated as a + (b + c) instead, which does not compute the same thing for floating-point values.

StefanKarpinski · December 5, 2018, 4:33pm

You should not use @simd if your algorithm depends on how + and * and other floating-point operations are associated. Consider the difference between left-to-right summation and left-to-right summation with a @simd annotation:

function mysum(v::Vector{Float64})
    t = 0.0
    for x in v
        t += x
    end
    return t
end

function mysum_simd(v::Vector{Float64})
    t = 0.0
    @simd for x in v
        t += x
    end
    return t
end

You already get slightly different results for most random arrays:

julia> v = rand(10^6);

julia> mysum(v)
499897.00784308364

julia> mysum_simd(v)
499897.00784311077

Well, that doesn’t seem terrible. How bad can it get? I addressed this question in this post:

In short, it can be as bad as possible—you can literally get arbitrarily wrong values. Here’s a minor tweak of the sumsto function from that post (changing realmax to floatmax and reversing the order of the iteration of powers), which really shows how bad this is:

function sumsto(x::Float64)
    0 <= x < exp2(970) || throw(ArgumentError("sum must be in [0,2^970)"))
    n, p₀ = Base.decompose(x) # integers such that `n*exp2(p₀) == x`
    [floatmax(); [exp2(p) for p in reverse(-1074:969) if iseven(n >> (p-p₀))]
    -floatmax(); [exp2(p) for p in reverse(-1074:969) if isodd(n >> (p-p₀))]]
end

julia> s = rand()*10^6
629436.8129391546

julia> v = sumsto(s)
2046-element Array{Float64,1}:
   1.7976931348623157e308
   4.9896007738368e291
   2.4948003869184e291
   ⋮
   1.862645149230957e-9
   4.656612873077393e-10
   1.1641532182693481e-10

julia> mysum(v)
629436.8129391546

julia> mysum_simd(v)
6.652801031782399e291

julia> abs(mysum_simd(v)-mysum(v))/mysum(v)
1.056945017358796e286

Assuming that the result of left-to-right summation is the desired result, that’s a relative error of 10^286! Oops. Julia’s built-in sum also gives an entirely different answer by reassociating the + operations in a different way:

julia> sum(v)
0.0

Of course, there’s a strong chance that what you really wanted was the true sum, which is 9.9792015476736e291 (rounded to 64-bit precision). None of these algorithms give that, although mysum_simd comes closest. Kahan summation does give the correct answer:

julia> using KahanSummation

julia> sum_kbn(v)
9.9792015476736e291

That comes at the cost of being about 2x slower than built-in sum or mysum_simd:

julia> @time mysum(v)
  0.000008 seconds (5 allocations: 176 bytes)
629436.8129391546

julia> @time mysum_simd(v)
  0.000005 seconds (5 allocations: 176 bytes)
6.652801031782399e291

julia> @time sum(v)
  0.000005 seconds (5 allocations: 176 bytes)
0.0

julia> @time sum_kbn(v)
  0.000007 seconds (5 allocations: 176 bytes)
9.9792015476736e291

Note, however, that Kahan summation is fully correct and still faster than non-SIMD naive summation, so it’s pretty impressive overall.

bennedich · December 6, 2018, 12:24pm

Thanks for the clarification! I suspected that this was the case, however the way the documentation is worded makes it sound like any reordering can take place. Is this documented in more detail somewhere? If I wrote the above code, and someone pointed out during code review that it was not safe, referencing the documentation, I would have a very hard time defending it…

Juan · December 6, 2018, 12:53pm

I’ve found another “reason” not to use @simd
If there is a “break” command inside the function or loop then it doesn’t work.

StefanKarpinski · December 6, 2018, 3:55pm

I do want to add, lest my posts about summation order scare people off, that most of the time when you’re doing arithmetic in a loop like that, @simd is totally fine unless you have reason to care about the specific associativity of operations (e.g. in Kahan summation, which gets optimized away if the compiler is allowed to reassociate operations). In the case of summation, strict left-to-right summation is actually one of the least accurate algorithms, so putting @simd on the loop makes it faster and more accurate at computing the true sum. But, of course, we can’t know that you didn’t really want to compute the left-to-right sum specifically, so Julia leans conservative and assumes that you meant what you wrote unless you tell it otherwise.

Topic		Replies	Views
Is it possible to use @simd with multiple variable loop? Julia at Scale question , simd , parallel	5	858	September 4, 2017
@turbo macro giving slightly different results General Usage loopvectorization	6	487	January 28, 2023
Understanding @turbo behavior to use it right General Usage question	15	1284	September 20, 2022
@turbo macro gives incorrect results Performance loopvectorization	4	437	October 26, 2022
Does Julia use SIMD instructions for broadcast operations? General Usage question , performance , broadcast	18	5092	March 7, 2017

When shouldn't we use @simd?

Related topics