I often use the @turbo macro for calculations. But I recently discovered that using it sometimes gives incorrect results. Below are minimal working examples where different options are compared.
using LoopVectorization
tbuf = [0.0, 0.0]
tbuf2 = [0.0, 0.0]
tbuf3 = [0.0, 0.0]
@turbo for i = 1:length(tbuf)
tbuf[i] = 1.0
tbuf[i] += 1.0
tbuf[i] *= -1.0
end
@inbounds @simd for i = 1:length(tbuf2)
tbuf2[i] = 1.0
tbuf2[i] += 1.0
tbuf2[i] *= -1.0
end
@inbounds for i = 1:length(tbuf3)
tbuf3[i] = 1.0
tbuf3[i] += 1.0
tbuf3[i] *= -1.0
end
I’m not a LoopVectorization user so others will chime in but did you look at the package’s Readme, in particular this part:
We expect that any time you use the @turbo macro with a given block of code that you:
…
3. Are not relying on a specific execution order. @turbo can and will re-order operations and loops inside its scope, so the correctness cannot depend on a particular order. You cannot implement cumsum with @turbo.
I admit I’m still a little confused here. Clearly, this example does not rely on the loop execution order – each loop operates on only a single element. But they do rely on the instruction order within a single loop iteration. Almost every program does, for that matter.
I haven’t used it before, but I had assumed LoopVectorization was not usually in the business of transforming dependent instruction orderings. It’s hard to write a program when it will occasionally transform (a + b) * c to a * c, as appears to have happened here.
Why did this fail? Would it have worked if it were instead written as this? This appears to work correctly
@turbo for i = 1:length(tbuf)
z = 1.0
z += 1.0
tbuf[i] = z * -1.0
end
So the issue appears related to the repeated accesses to tbuf[i] in the original. Can someone clarify what patterns are and aren’t safe?
I’m not sure but if I were to guess, it won’t reorder operations within a single expression, but it may reorder statements, so you can’t rely on assignment occurring in the order you specify.