I came across this article that talks about loop fusion. I saw this as a great opportunity to optimize my code since a lot of intermediate allocations are happening in my program. After having adapted my code, the performance for large problems improved, which is great! On the other hand, the performance for small problems became considerably worse. The reason is type instability. I find it very difficult to write type stable code when tuples are involved. I would appreciate it if some of you Julia experts can give me advice as to how to write Julia performant code without having to opt for a trial-and-error approach which is costing me a lot of time.

Here is a short description of the code: `myproduct`

function takes in a VarArg of Factors. A factor consists of an N-dimensional array and a tuple of integers that can be thought of as the βnamesβ of each dimension. The objective of the function is to take a pointwise product of the N Factors.

```
using BenchmarkTools, InteractiveUtils
mutable struct FactorI{T, N}
vars::NTuple{N,Int64}
vals::Array{T}
end
function myproduct(in_factors::FactorI{T}...) where T
in_factors_card = map(x -> collect(size(x.vals)), in_factors)
out_factor_vars = map(x -> x.vars, in_factors) |> x -> union(x...) |> sort |> Tuple
in_factors_card_new = map(_ -> ones(Int64, length(out_factor_vars)), in_factors)
for (i, out_factor_var) in enumerate(out_factor_vars)
for (j, in_factor_vars) in enumerate(map(x -> x.vars, in_factors))
out_factor_var in in_factor_vars && (in_factors_card_new[j][i] = popfirst!(in_factors_card[j]))
end
end
in_factors_vals = map(in_factor -> in_factor.vals, in_factors)
in_factors_vals_new = map((x, y) -> reshape(x, Tuple(y)), in_factors_vals, in_factors_card_new)
out_factor_card = hcat(in_factors_card_new...) |> x -> maximum(x, dims=2)
out_factor_new = FactorI{T, length(out_factor_vars)}(out_factor_vars, zeros(out_factor_card...))
_product!(out_factor_new, in_factors_vals_new)
end
function _product!(out_factor, in_factors_vals)
out_factor.vals = .*(in_factors_vals...) # FUNCTION BARRIER AND LOOP FUSION!!!
end
A_vars = (2,)
A = FactorI{Float64,1}(A_vars, [0.11; 0.89])
B_vars = (2,4)
B = FactorI{Float64,length(B_vars)}(B_vars, [0.5 0.1; 0.7 0.2])
C_vars = (1,2,3)
C = FactorI{Float64,length(C_vars)}(C_vars, cat([0.25 0.08; 0.05 0.0; 0.15 0.09],
[0.35 0.16; 0.07 0.0; 0.21 0.18], dims=3))
@code_warntype myproduct(A, B, C)
@btime myproduct(A, B, C)
```

Any other advice as to how I could improve this code would be very much appreciated.