I have the following minimal working example that contains two weird behaviors of @turbo with sequential update:
using LoopVectorization
# Incorrect usage of @turbo
function wrong_example(n)
x = 0.0
y = zeros(n)
@turbo for i in 1:n
x += i * 2 + 1 # the order of updating this matters somehow with both * and +?
for j in 1:n
y[j] += 1
end
end
return x, y
end
# Correct usage without @turbo
function correct_example(n)
x = 0.0
y = zeros(n)
for i in 1:n
x += i * 2 + 1 # Sequential update of x
for j in 1:n
y[j] += 1
end
end
return x, y
end
# Test with a small n
n = 5
x_wrong, y_wrong = wrong_example(n)
x_correct, y_correct = correct_example(n)
println("x_wrong = ", x_wrong)
println("y_wrong = ", y_wrong)
println("x_correct = ", x_correct)
println("y_correct = ", y_correct)
The result spits
x_wrong = 0.0
y_wrong = [1.0, 1.0, 1.0, 1.0, 1.0]
x_correct = 35.0
y_correct = [5.0, 5.0, 5.0, 5.0, 5.0]
The reasons why either x or y differ from the correct results seem peculiar to me, as both should not rely on specific execution orders. In particular, either removing *2
or +1
in the operations of x can’t generate the difference in x.
I wonder why these are the cases and how I can fix these. Since eventually I want to apply @turbo onto a much more complicated stochastic optimization problem, I will also appreciate some in depth explanation of how @turbo parallelize loops and some general advice on what to avoid when it comes to updating something with either += or *=.