@turbo macro gives incorrect results

I often use the @turbo macro for calculations. But I recently discovered that using it sometimes gives incorrect results. Below are minimal working examples where different options are compared.

using LoopVectorization

tbuf = [0.0, 0.0]
tbuf2 = [0.0, 0.0]
tbuf3 = [0.0, 0.0]

@turbo for i = 1:length(tbuf)    
    tbuf[i] = 1.0
    tbuf[i] += 1.0
    tbuf[i] *= -1.0

@inbounds @simd for i = 1:length(tbuf2)    
    tbuf2[i] = 1.0
    tbuf2[i] += 1.0
    tbuf2[i] *= -1.0

@inbounds for i = 1:length(tbuf3)    
    tbuf3[i] = 1.0
    tbuf3[i] += 1.0
    tbuf3[i] *= -1.0

As a result, we have

julia> tbuf
2-element Vector{Float64}:

julia> tbuf2
2-element Vector{Float64}:

julia> tbuf3
2-element Vector{Float64}:

Can someone explain why that is?

My software versions are

julia> versioninfo()
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, ivybridge)
(@v1.6) pkg> st
  [bdcacae8] LoopVectorization v0.12.136

I’m not a LoopVectorization user so others will chime in but did you look at the package’s Readme, in particular this part:

We expect that any time you use the @turbo macro with a given block of code that you:

3. Are not relying on a specific execution order. @turbo can and will re-order operations and loops inside its scope, so the correctness cannot depend on a particular order. You cannot implement cumsum with @turbo.


Thank you! Apparently this is the reason.

I admit I’m still a little confused here. Clearly, this example does not rely on the loop execution order – each loop operates on only a single element. But they do rely on the instruction order within a single loop iteration. Almost every program does, for that matter.

I haven’t used it before, but I had assumed LoopVectorization was not usually in the business of transforming dependent instruction orderings. It’s hard to write a program when it will occasionally transform (a + b) * c to a * c, as appears to have happened here.

Why did this fail? Would it have worked if it were instead written as this? This appears to work correctly

@turbo for i = 1:length(tbuf)    
    z = 1.0
    z += 1.0
    tbuf[i] = z * -1.0

So the issue appears related to the repeated accesses to tbuf[i] in the original. Can someone clarify what patterns are and aren’t safe?

1 Like

I’m not sure but if I were to guess, it won’t reorder operations within a single expression, but it may reorder statements, so you can’t rely on assignment occurring in the order you specify.