# UndefVarError from LoopVectorization.@turbo

I’m having trouble trying to use `LoopVectorization.@turbo`—it gives an `ERROR: UndefVarError: i1 not defined`.

I suspect that the issue stem from the “strange” format of the `weights` and `indices` inputs. (The reason for the strange format is that I don’t know the length of vectors in `A`.)

Can I rewrite `test1` so that `@turbo` works here?

``````using LoopVectorization

function test1(A)
n1, n2 = size(A)
n3 = 100

# Generate nonsense inputs
weights = Matrix{Vector{NTuple{2, Float64}}}(
undef,
n1,
n2,
)
indices = Matrix{Vector{Int}}(
undef,
n1,
n2,
)
nA = length(A[1])
for ix in eachindex(weights)
weights[ix] = Vector{NTuple{2, Float64}}(undef, n3)
for i3 in eachindex(weights[ix])
weights[ix][i3] = (rand(2)...,)
weights[ix][i3] = weights[ix][i3] ./ sum(weights[ix][i3])
end

indices[ix] = sort(rand(1:nA - 1, n3))
end

# Computation
B = zeros(n1, n2, n3)
@turbo for i3 in 1:n3
for i2 in 1:n2
for i1 in 1:n1
for i4 in 0:1
B[i1, i2, i3] += weights[i1, i2][i3][1 + i4] * A[i1, i2][indices[i1, i2][i3] + i4]
end
end
end
end

return B
end

n1 = 3
n2 = 7
A = Matrix{Vector{Float64}}(
undef,
n1,
n2,
)
for ix in eachindex(A)
A[ix] = sort(rand(60));
end

test1(A)
``````

I’ tend to agree. Assuming that you want to optimize the code I then checked `@tturbo` (same problem) and `@batch` from `Polyester` (no speedup).

A quick glance with a profiler revealed that most of the time is spent in the setup and not in the computation. In the setup you can fix at least two weaknesses (let us wait what @DNF sees;)

``````            # Allocates
# weights[ix][i3] = (rand(2)...,)
weights[ix][i3] = ntuple(_ -> rand(), 2)
``````

and

``````        # Dynamic dispatch
# indices[ix] = sort(rand(1:nA - 1, n3))
indices[ix] = rand(1:nA - 1, n3)
sort!(indices[ix])
``````

With these fixes `@btime` drops from 400 μs to 80 μs but the setup (specifically the sorting of indices) is still dominating (no speedup with `@batch`).