I am trying to optimize some for-loops in a package I am working on, so I decided to check out LoopVectorization
. I ran into some errors, and constructed the following MWE:
using LoopVectorization
function f(jmax,kmax)
@assert jmax >= 1
@assert kmax >= 1
s = 0
@turbo for j in 1:jmax
for k in 1:kmax
s += 1
end
end
return s
end
function g(jmax)
@assert jmax >= 1
s = 0
@turbo for j in 1:jmax
for k in 1:j
s += 1
end
end
return s
end
sf = f(5,5)
println("sf=",sf)
sg = g(5)
println("sg=",sg)
When I run this script, I get
ERROR: LoadError: UndefVarError: `j` not defined in `Main`
Suggestion: check for spelling errors or missing imports.
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/LoopVectorization/tIJUA/src/condense_loopset.jl:1179 [inlined]
[2] g(jmax::Int64)
@ Main ~/Research/LoopVectorization/loop_vectorization.jl:20
[3] top-level scope
@ ~/Research/LoopVectorization/loop_vectorization.jl:30
[4] include(fname::String)
@ Main ./sysimg.jl:38
[5] top-level scope
@ REPL[8]:1
in expression starting at /home/spencer/Research/LoopVectorization/loop_vectorization.jl:30
Could someone help me understand why this fails? Using @turbo
on a nested format should be fine, see the following example from the documentation.
function A_mul_B!(C, A, B)
@turbo for n ∈ indices((C,B), 2), m ∈ indices((C,A), 1)
Cmn = zero(eltype(C))
for k ∈ indices((A,B), (2,1))
Cmn += A[m,k] * B[k,n]
end
C[m,n] = Cmn
end
end
The README on github warns against the following:
- Are not indexing an array out of bounds.
@turbo
does not perform any bounds checking. - Are not iterating over an empty collection. Iterating over an empty loop such as
for i ∈ eachindex(Float64[])
is undefined behavior, and will likely result in the out of bounds memory accesses. Ensure that loops behave correctly. - Are not relying on a specific execution order.
@turbo
can and will re-order operations and loops inside its scope, so the correctness cannot depend on a particular order. You cannot implementcumsum
with@turbo
. - Are not using multiple loops at the same level in nested loops.
Certainly 1. and 2. do not apply.
I don’t think 3. applies. The values k
can take depend on which iteration of the outer loop we are on, but which order those happen in doesn’t matter. I.e. it only matters that we do s += 1
for (j,k)
equal to (1,1)
, (2,1)
, (2,2)
, (3,1)
, (3,2)
, (3,2)
, regardless of which order that iteration is performed.
I don’t think 4. applies, if I understand correctly. I think it is warning against doing something like
for i in range1
for j in range2
[...]
end
for k in range3
[...]
end
end
In any case, it’s conceivable to me that a function g
might be something LoopVectorization
wasn’t designed to handle, but the UndefVarError
doesn’t tell me anything, and makes it look like I made a syntax mistake more than anything else.