Inconsistent results using LoopVectorization @turbo with linear indexing

I am obtaining incorrect results when I combine @turbo with linear indexing of my array. I have a simple example of the issue below. Results are the same except when I combine @turbo with linear indexing.

using LoopVectorization
using Random

Random.seed!(1234)
H = rand(10, 12, 3);

su = zeros(10, 12);
for k = 1:3
    for j in 1:12
        for i in 1:10
            su[i, j] += H[i, j, k]
        end
    end
end
println(sum(su))

su = zeros(10, 12);
for k = 1:3
    @turbo for j in 1:12
        for i in 1:10
            su[i, j] += H[i, j, k]
        end
    end
end
println(sum(su))

su = zeros(10, 12);
for k = 1:3
    ind = LinearIndices((10, 12, 3))[1, 1, k] - 1
    for ij in 1:120
        su[ij] += H[ij + ind]
    end
end
println(sum(su))

su = zeros(10, 12);
for k = 1:3
    ind = LinearIndices((10, 12, 3))[1, 1, k] - 1
    @turbo for ij in 1:120
        su[ij] += H[ij + ind]
    end
end
println(sum(su))

This gives the following output

184.9400314609572
184.9400314609572
184.9400314609572
184.12871254811873

Why am I getting different results in this latter case?

2 Likes

The numbers look off by 1 with a near-zero garbage element. The discrepancy in the last sum 184.9400314609572 - 184.12871254811873 is 0.811318912838459, which is very close to H[end] == 0.8113189128384447.
In a more minimal example:

julia> H = collect(reshape(1.0:6.0, 3, 2))
3×2 Matrix{Float64}:
 1.0  4.0
 2.0  5.0
 3.0  6.0

julia> su = zeros(3);

julia> for k = 1:1
           ind = 0 # LinearIndices((3, 2))[1, 1] - 1
           for ij in 1:3
               su[ij] += H[ij + ind]
           end
       end

julia> su
3-element Vector{Float64}:
 1.0
 2.0
 3.0

julia> su = zeros(3);

julia> for k = 1:1
           ind = 0
           @turbo for ij in 1:3
               su[ij] += H[ij + ind]
           end
       end

julia> su
3-element Vector{Float64}:
 6.94274366677856e-310
 1.0
 2.0

so H is being indexed at 0, 1, 2 instead of 1, 2, 3. The effect doesn’t seem to happen on the left hand expression e.g. su[ij + ind] and doesn’t seem to shift more with more additions e.g. H[ij + ind + ind]. It does it correctly if the index was added to a literal 0 instead of ind or if not added:

julia> su = zeros(3);

julia> for k = 1:1
           @turbo for ij in 1:3
               su[ij] += H[ij + 0]
           end
       end

julia> su
3-element Vector{Float64}:
 1.0
 2.0
 3.0

julia> su = zeros(3);

julia> for k = 1:1
           @turbo for ij in 1:3
               su[ij] += H[ij]
           end
       end

julia> su
3-element Vector{Float64}:
 1.0
 2.0
 3.0
2 Likes