Why is this loop type not supported by LoopVectorization?

I am trying to understand why I can’t use @turbo in part of my code. I can reduce the issue to the following example

using LoopVectorization

f = [15, 13, 12, 10, 9, 7, 5, 4, 2, 1]
y0 = zeros(16)
y1 = zeros(16)

for i = 1:10
    y0[f[i]] += 1.0
    y0[f[i] + 1] += 2.0
end

@turbo for i = 1:10
    y1[f[i]] += 1.0
    y1[f[i] + 1] += 2.0
end

I then get

julia> [y0 y1]
16×2 Matrix{Float64}:
 1.0  1.0
 3.0  2.0
 2.0  2.0
 1.0  1.0
 3.0  2.0
 2.0  2.0
 1.0  1.0
 2.0  2.0
 1.0  1.0
 3.0  2.0
 2.0  2.0
 1.0  1.0
 3.0  2.0
 2.0  2.0
 1.0  1.0
 2.0  2.0

So I am not getting the same output when I use @turbo. Why is it not appropriate to use it in this case? It was not obvious to me from reading LoopVectorization.jl limitations, so I suspect I may have misunderstood something. Thanks.

From the @turbo documentation:

It assumes that loop iterations are independent.

Here, the loop iterations are somewhat dependent (though they can be executed in any order) because different iterations modify the same y1 element, which means that they cannot be executed in parallel (SIMD is instruction-level parallelism). And you can see that it precisely for these elements (which have value 3.0 after the scalar loop) that the results differ.

1 Like

What is the definition of independence?

google “loop independence”

1 Like

If you can rearrange the order of the iterations without changing the result of the calculation.

For example, what happens if you run this many times:

for i = shuffle(1:10) 
    y0[f[i]] += 1.0
    y0[f[i] + 1] += 2.0
end

Do you always get the same result?

1 Like

I would guess that invariance to shuffling is not enough either. The different iterations should be able to execute simultaneously, so check for possible data races.

In this case yes, because addition is commutative. But the iterations cannot execute arbitrarily in parallel without data races.

1 Like