Passing arguments separately versus packing them into a struct

I regularly have a large number of arguments to carry around, where each of the arguments has a concrete type, but where the concrete types can be different (multiple dispatch). The arguments are typically more complicated than the ones in the code below. Other than the overhead of packing and unpacking elements into and out of S, are there any gotchas with using S over using separate arguments?

using UnPack, BenchmarkTools

function f( a :: Matrix{T1}, b :: Matrix{T2}, c :: Matrix{T3}, d :: Matrix{T4} ) where {T1<:Real,T2<:Real,T3<:Real,T4<:Real}
    return a * b * c * d 
end


struct S{A,B,C,D}
    a :: A 
    b :: B 
    c :: C 
    d :: D

end



function g( s :: S{Matrix{T1},Matrix{T2},Matrix{T3},Matrix{T4}} ) where {T1<:Real,T2<:Real,T3<:Real,T4<:Real}
    @unpack a,b,c,d = s

    return a * b * c * d 
end


function h( s :: S{Matrix{T1}} ) where {T1<:Real}
    @unpack a,b,c,d = s

    return a * b * c * d 
end

function i( s :: S ) 
    @unpack a,b,c,d = s

    return a * b * c * d 
end

@btime  f(a,b,c,d) setup=( a = randn( 20, 3 ); b = rand(Int,3,50); c = randn(50,30); d = randn(30,100) )
@btime  g( s ) setup=( s = S( randn( 20, 3 ), rand(Int,3,50), randn(50,30), randn(30,100) ) )
@btime  h( s ) setup=( s = S( randn( 20, 3 ), rand(Int,3,50), randn(50,30), randn(30,100) ) )
@btime  i( s ) setup=( s = S( randn( 20, 3 ), rand(Int,3,50), randn(50,30), randn(30,100) ) )

This is more about API. Packing and unpacking shouldn’t have overhead because you’re doing the same loads and stores with different locations. f(a, b, c, d) is inputting 4 pointers to matrices to reference just like g(s) is inputting 4 pointers to matrices; you just happened to be writing those 4 pointers as 1 structure s for g.

1 Like

This is redundant. The types `A, B, C, D` are the types of the input args and must by necessity be concrete.

1 Like

You’re right of course; I’ll remove the brainfart.

Thanks @Benny . The overhead is small, but there should be some, no? The @unpack macro generates code and there should be a small overhead in creating the struct to begin with. The lowered code is different.

What I intended to ask is whether there are any gotchas with using the S route if I uses a fully parametric type.

Indeed, but expressions and lowered code aren’t perfectly correlated with runtime because the compiler tries pretty hard to optimize. Check @code_llvm and @code_native if you want to see the code after most or all of the optimizations. In all cases here, it’s just 4 loads and 4 stores to a _quad_matmul call. Accordingly, the @btime lines all give me 6.825-6.875μs, the variance being noise, not overhead.

The default instantiation is just putting 4 pointers in one spot. Putting 4 pointers in 4 different, likely adjacent, spots for variables isn’t a time saver.

Not in this simple of an example. If you have a particular set of inputs that goes together into most calls, refactoring as one structure is entirely appropriate.