I am obtaining incorrect results when I combine @turbo with linear indexing of my array. I have a simple example of the issue below. Results are the same except when I combine @turbo with linear indexing.
using LoopVectorization
using Random
Random.seed!(1234)
H = rand(10, 12, 3);
su = zeros(10, 12);
for k = 1:3
for j in 1:12
for i in 1:10
su[i, j] += H[i, j, k]
end
end
end
println(sum(su))
su = zeros(10, 12);
for k = 1:3
@turbo for j in 1:12
for i in 1:10
su[i, j] += H[i, j, k]
end
end
end
println(sum(su))
su = zeros(10, 12);
for k = 1:3
ind = LinearIndices((10, 12, 3))[1, 1, k] - 1
for ij in 1:120
su[ij] += H[ij + ind]
end
end
println(sum(su))
su = zeros(10, 12);
for k = 1:3
ind = LinearIndices((10, 12, 3))[1, 1, k] - 1
@turbo for ij in 1:120
su[ij] += H[ij + ind]
end
end
println(sum(su))
The numbers look off by 1 with a near-zero garbage element. The discrepancy in the last sum 184.9400314609572 - 184.12871254811873 is 0.811318912838459, which is very close to H[end] == 0.8113189128384447.
In a more minimal example:
julia> H = collect(reshape(1.0:6.0, 3, 2))
3×2 Matrix{Float64}:
1.0 4.0
2.0 5.0
3.0 6.0
julia> su = zeros(3);
julia> for k = 1:1
ind = 0 # LinearIndices((3, 2))[1, 1] - 1
for ij in 1:3
su[ij] += H[ij + ind]
end
end
julia> su
3-element Vector{Float64}:
1.0
2.0
3.0
julia> su = zeros(3);
julia> for k = 1:1
ind = 0
@turbo for ij in 1:3
su[ij] += H[ij + ind]
end
end
julia> su
3-element Vector{Float64}:
6.94274366677856e-310
1.0
2.0
so H is being indexed at 0, 1, 2 instead of 1, 2, 3. The effect doesn’t seem to happen on the left hand expression e.g. su[ij + ind] and doesn’t seem to shift more with more additions e.g. H[ij + ind + ind]. It does it correctly if the index was added to a literal 0 instead of ind or if not added:
julia> su = zeros(3);
julia> for k = 1:1
@turbo for ij in 1:3
su[ij] += H[ij + 0]
end
end
julia> su
3-element Vector{Float64}:
1.0
2.0
3.0
julia> su = zeros(3);
julia> for k = 1:1
@turbo for ij in 1:3
su[ij] += H[ij]
end
end
julia> su
3-element Vector{Float64}:
1.0
2.0
3.0