UndefVarError from LoopVectorization.@turbo

fredrikpaues · March 24, 2022, 10:57am

I’m having trouble trying to use LoopVectorization.@turbo—it gives an ERROR: UndefVarError: i1 not defined.

I suspect that the issue stem from the “strange” format of the weights and indices inputs. (The reason for the strange format is that I don’t know the length of vectors in A.)

Can I rewrite test1 so that @turbo works here?

using LoopVectorization

function test1(A)
    n1, n2 = size(A)
    n3 = 100

    # Generate nonsense inputs
    weights = Matrix{Vector{NTuple{2, Float64}}}(
        undef,
        n1,
        n2,
    )
    indices = Matrix{Vector{Int}}(
        undef,
        n1,
        n2,
    )
    nA = length(A[1])
    for ix in eachindex(weights)
        weights[ix] = Vector{NTuple{2, Float64}}(undef, n3)
        for i3 in eachindex(weights[ix])
            weights[ix][i3] = (rand(2)...,)
            weights[ix][i3] = weights[ix][i3] ./ sum(weights[ix][i3])
        end

        indices[ix] = sort(rand(1:nA - 1, n3))
    end

    # Computation
    B = zeros(n1, n2, n3)
    @turbo for i3 in 1:n3
        for i2 in 1:n2
            for i1 in 1:n1
                for i4 in 0:1
                    B[i1, i2, i3] += weights[i1, i2][i3][1 + i4] * A[i1, i2][indices[i1, i2][i3] + i4]
                end
            end
        end
    end

    return B
end

n1 = 3
n2 = 7
A = Matrix{Vector{Float64}}(
    undef,
    n1,
    n2,
)
for ix in eachindex(A)
    A[ix] = sort(rand(60));
end

test1(A)

goerch · March 24, 2022, 11:58am

I’ tend to agree. Assuming that you want to optimize the code I then checked @tturbo (same problem) and @batch from Polyester (no speedup).

A quick glance with a profiler revealed that most of the time is spent in the setup and not in the computation. In the setup you can fix at least two weaknesses (let us wait what @DNF sees;)

            # Allocates
            # weights[ix][i3] = (rand(2)...,)
            weights[ix][i3] = ntuple(_ -> rand(), 2)

and

        # Dynamic dispatch
        # indices[ix] = sort(rand(1:nA - 1, n3))
        indices[ix] = rand(1:nA - 1, n3)
        sort!(indices[ix])

With these fixes @btime drops from 400 μs to 80 μs but the setup (specifically the sorting of indices) is still dominating (no speedup with @batch).

Topic		Replies	Views
Inconsistent results using LoopVectorization @turbo with linear indexing Performance	1	239	October 2, 2023
UndefVarError when using @turbo on nested for loop whose inner loop depends on value in outer loop General Usage package , loopvectorization	2	57	February 2, 2025
LoopVectorization for sparse matrix operation : @turbo, LoopVectorization.check_args Performance question	1	376	March 3, 2023
How to use LoopVectorization.jl? General Usage parallel , loopvectorization	7	1317	September 15, 2023
Problem with LoopVectorization : @turbo, LoopVectorization.check_args Performance loopvectorization	27	2864	October 26, 2022

UndefVarError from LoopVectorization.@turbo

Related topics