UndefVarError when using @turbo on nested for loop whose inner loop depends on value in outer loop

I am trying to optimize some for-loops in a package I am working on, so I decided to check out LoopVectorization. I ran into some errors, and constructed the following MWE:

using LoopVectorization

function f(jmax,kmax)
    @assert jmax >= 1
    @assert kmax >= 1

    s = 0
    @turbo for j in 1:jmax
        for k in 1:kmax
            s += 1
        end
    end
    return s
end

function g(jmax)
    @assert jmax >= 1

    s = 0
    @turbo for j in 1:jmax
        for k in 1:j
            s += 1
        end
    end
    return s
end

sf = f(5,5)
println("sf=",sf)
sg = g(5)
println("sg=",sg)

When I run this script, I get

ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/LoopVectorization/tIJUA/src/condense_loopset.jl:1179 [inlined]
 [2] g(jmax::Int64)
   @ Main ~/Research/LoopVectorization/loop_vectorization.jl:20
 [3] top-level scope
   @ ~/Research/LoopVectorization/loop_vectorization.jl:30
 [4] include(fname::String)
   @ Main ./sysimg.jl:38
 [5] top-level scope
   @ REPL[8]:1
in expression starting at /home/spencer/Research/LoopVectorization/loop_vectorization.jl:30

Could someone help me understand why this fails? Using @turbo on a nested format should be fine, see the following example from the documentation.

function A_mul_B!(C, A, B)
    @turbo for n ∈ indices((C,B), 2), m ∈ indices((C,A), 1)
        Cmn = zero(eltype(C))
        for k ∈ indices((A,B), (2,1))
            Cmn += A[m,k] * B[k,n]
        end
        C[m,n] = Cmn
    end
end
The README on github warns against the following:
  1. Are not indexing an array out of bounds. @turbo does not perform any bounds checking.
  2. Are not iterating over an empty collection. Iterating over an empty loop such as for i ∈ eachindex(Float64[]) is undefined behavior, and will likely result in the out of bounds memory accesses. Ensure that loops behave correctly.
  3. Are not relying on a specific execution order. @turbo can and will re-order operations and loops inside its scope, so the correctness cannot depend on a particular order. You cannot implement cumsum with @turbo.
  4. Are not using multiple loops at the same level in nested loops.

Certainly 1. and 2. do not apply.

I don’t think 3. applies. The values k can take depend on which iteration of the outer loop we are on, but which order those happen in doesn’t matter. I.e. it only matters that we do s += 1 for (j,k) equal to (1,1), (2,1), (2,2), (3,1), (3,2), (3,2), regardless of which order that iteration is performed.

I don’t think 4. applies, if I understand correctly. I think it is warning against doing something like

for i in range1
  for j in range2
     [...]
  end
  for k in range3
     [...]
  end
end

In any case, it’s conceivable to me that a function g might be something LoopVectorization wasn’t designed to handle, but the UndefVarError doesn’t tell me anything, and makes it look like I made a syntax mistake more than anything else.

I suppose it’s not mentioned in the README, but see the docs Getting Started · LoopVectorization.jl

Currently LoopVectorization only supports rectangular iteration spaces, although I plan on extending it to triangular and ragged iteration spaces in the future. This means that if you nest multiple loops, the number of iterations of the inner loops shouldn’t be a function of the outer loops.

3 Likes

Thanks! It would be nice if the macro could catch the non-rectangular nested loop and give a more descriptive error. Not that I’m complaining; this package does low-level optimizations on a level that is magic to me.