@turbo macro gives incorrect results

sidelkin · October 26, 2022, 8:00am

I often use the @turbo macro for calculations. But I recently discovered that using it sometimes gives incorrect results. Below are minimal working examples where different options are compared.

using LoopVectorization

tbuf = [0.0, 0.0]
tbuf2 = [0.0, 0.0]
tbuf3 = [0.0, 0.0]

@turbo for i = 1:length(tbuf)    
    tbuf[i] = 1.0
    tbuf[i] += 1.0
    tbuf[i] *= -1.0
end

@inbounds @simd for i = 1:length(tbuf2)    
    tbuf2[i] = 1.0
    tbuf2[i] += 1.0
    tbuf2[i] *= -1.0
end

@inbounds for i = 1:length(tbuf3)    
    tbuf3[i] = 1.0
    tbuf3[i] += 1.0
    tbuf3[i] *= -1.0
end

As a result, we have

julia> tbuf
2-element Vector{Float64}:
 -1.0
 -1.0

julia> tbuf2
2-element Vector{Float64}:
 -2.0
 -2.0

julia> tbuf3
2-element Vector{Float64}:
 -2.0
 -2.0

Can someone explain why that is?

My software versions are

julia> versioninfo()
Julia Version 1.6.7
Commit 3b76b25b64 (2022-07-19 15:11 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, ivybridge)
(@v1.6) pkg> st
  [bdcacae8] LoopVectorization v0.12.136

nilshg · October 26, 2022, 8:08am

I’m not a LoopVectorization user so others will chime in but did you look at the package’s Readme, in particular this part:

We expect that any time you use the @turbo macro with a given block of code that you:
…
3. Are not relying on a specific execution order. @turbo can and will re-order operations and loops inside its scope, so the correctness cannot depend on a particular order. You cannot implement cumsum with @turbo.

sidelkin · October 26, 2022, 8:17am

Thank you! Apparently this is the reason.

mikmoore · October 26, 2022, 3:10pm

I admit I’m still a little confused here. Clearly, this example does not rely on the loop execution order – each loop operates on only a single element. But they do rely on the instruction order within a single loop iteration. Almost every program does, for that matter.

I haven’t used it before, but I had assumed LoopVectorization was not usually in the business of transforming dependent instruction orderings. It’s hard to write a program when it will occasionally transform (a + b) * c to a * c, as appears to have happened here.

Why did this fail? ~~Would it have worked if it were instead written as this?~~ This appears to work correctly

@turbo for i = 1:length(tbuf)    
    z = 1.0
    z += 1.0
    tbuf[i] = z * -1.0
end

So the issue appears related to the repeated accesses to tbuf[i] in the original. Can someone clarify what patterns are and aren’t safe?

dlakelan · October 26, 2022, 3:53pm

I’m not sure but if I were to guess, it won’t reorder operations within a single expression, but it may reorder statements, so you can’t rely on assignment occurring in the order you specify.

Topic		Replies	Views
@turbo macro giving slightly different results General Usage loopvectorization	6	505	January 28, 2023
Why is this loop type not supported by LoopVectorization? New to Julia	6	314	August 16, 2023
Problems with sequential update with @turbo General Usage loopvectorization	0	48	August 20, 2024
```@turbo``` producing different (and wrong) results compared to ```@inbounds @simd``` General Usage bug , simd , potential-bug , loopvectorization	3	410	March 30, 2023
Inconsistent results using LoopVectorization @turbo with linear indexing Performance	1	239	October 2, 2023

@turbo macro gives incorrect results

Related topics