Memory allocations when performing operation on Tuple of structs

That looks like a valid function to me, but you should test if it eventually gives up on unrolling the loop and ends up allocating. You could alternatively also do the following if you want to avoid explicitly passing Val(length(a.ops)) :

LinearAlgebra.mul!(y::AbstractVector, a::MyStructSum, x::AbstractVector) = LinearAlgebra.mul!(y, a, x, Val(length(a.ops)))
@generated function LinearAlgebra.mul!(y::AbstractVector, a::MyStructSum, x::AbstractVector, ::Val{N}) where N
    quote 
        Base.@nexprs $N i -> begin
            @inline 
            mul!(y, a.ops[i], x)
        end
        return y
    end
end

whatever version you prefer. Base.@nexprs is in charge of the unrolling. For example, Base.@nexprs 3 i -> mul!(y, a.ops[i], x) will literally generate the following code:

mul!(y, a.ops[1], x)
mul!(y, a.ops[2], x)
mul!(y, a.ops[3], x)

So I don’t expect this version to allocate memory; however, how large do you expect the length of the tuple to be? I guess you will see large compilation times if it’s really long (I don’t have an intuition here, you should test it).

As for foreach, it looks like it has a hard-coded manual unrolling up to N=31, and then it loops over the remaining elements of the tuple. You can see it here.

1 Like