Why does wrapping an array in a composite type degrade performance?

The code below performs 2 broadcast multiplications between 2 3D-arrays. The result of the latter is wrapped in a composite type. Does anybody know why the former is faster and allocates less memory? Is it a benchmarking artifact? Or does the composite type abstraction incur a performance cost?

using BenchmarkTools

struct MyFactor{vars, card, T}
  vals::T
end

a_vals = rand(2,3,1);
b_vals = rand(2,1,2);

c_vars = (2,3,4);
c_card = (2,3,2);

@btime $a_vals .* $b_vals;
@btime MyFactor{$c_vars, $c_card, Array{Float64, length($c_vars)}}($a_vals .* $b_vals);

Output:

  72.113 ns (1 allocation: 176 bytes)
  282.406 ns (4 allocations: 256 bytes)

It seems that having those tuples in the type signature is causing the extra allocations.

If I define:

struct MyFactor{ET,N}
  vars::NTuple{N,Int64}
  card::NTuple{N,Int64}
  vals::Array{ET,N}
end

I get

@btime MyFactor{Float64,length($c_vars)}($c_vars, $c_card, $a_vals .* $b_vals)

80.668 ns (1 allocation: 176 bytes)

(and on my machine the bare a_vals .* b_vals is 78 ns ).

I don’t know what is your actual use case, so no clue if the alternative definition is helpful for you or not.

1 Like

Thanks @orialb for your suggestion. The reason I wanted to have vars (variables) and card (cardinality or dimension size) as type parameters is to avoid type instability. Here is one example of a function that takes the MyFactor type as argument:

function marginalize(A::MyFactor{T,N} where N, V::Vector{Int64}) where T 
  dims = indexin(V, collect(A.vars)) # map vars to dims
  r_size = ntuple(d->d in dims ? 1 : size(A.vals,d), getdims(A)) # assign 1 to summed out dims
  ret_size = filter(s -> s != 1, r_size)
  ret_vars = filter(v -> v ∉ V, A.vars)
  r_vals = similar(A.vals, r_size)
  ret_vals = sum!(r_vals, A.vals) |> x -> dropdims(x, dims=Tuple(dims))
  MyFactor{eltype(A.vals),length(ret_vars)}(ret_vars, ret_vals)
end

Using your MyFactor definition, the compiled code is not type stable.

So ideally what I would like, is to have vars and card in the type (which gives me type stability) but without having to allocate data on the heap (which apparently is being done based on the info of my first message)

Breaking up the second test into a function to make it clearer:

test2(a_vals, b_vals, c_vars, c_card) = MyFactor{c_vars, c_card, Array{Float64, length(c_vars)}}(a_vals .* b_vals)
@btime test2($a_vals, $b_vals, $c_vars, $c_card)
@code_warntype test2(a_vals, b_vals, c_vars, c_card)

the last line shows the problem, its not type stable as you noted (hence why its slower), and its not type stable because the type that you create depends on the values of c_vars and c_card, since you’re explicitly sticking them into the type, so the compiler can’t know at compile type what type to return.

If you really want to stick those values into the type, the canonical way to do it type stabily is to use Val types, e.g. this has no overhead compared to your first test:

test3(a_vals, b_vals, ::Val{c_vars}, ::Val{c_card}) where {c_vars, c_card} = MyFactor{c_vars, c_card, Array{Float64, length(c_vars)}}(a_vals .* b_vals)
@btime test3($a_vals, $b_vals, $(Val(c_vars)), $(Val(c_card)))

but note that this means that this function and any function that takes a MyFactor argument will be recompiled for every different possible value of c_vars and c_card you use. I would consider if you truly need those in the type, perhaps you do (although its not clear to me from your last example), but if you don’t, you’ll get faster compilation times and clearer code if you leave them out of the type.

6 Likes