Type stability with tuples is hard

I came across this article that talks about loop fusion. I saw this as a great opportunity to optimize my code since a lot of intermediate allocations are happening in my program. After having adapted my code, the performance for large problems improved, which is great! On the other hand, the performance for small problems became considerably worse. The reason is type instability. I find it very difficult to write type stable code when tuples are involved. I would appreciate it if some of you Julia experts can give me advice as to how to write Julia performant code without having to opt for a trial-and-error approach which is costing me a lot of time.

Here is a short description of the code: myproduct function takes in a VarArg of Factors. A factor consists of an N-dimensional array and a tuple of integers that can be thought of as the β€œnames” of each dimension. The objective of the function is to take a pointwise product of the N Factors.

using BenchmarkTools, InteractiveUtils

mutable struct FactorI{T, N}
  vars::NTuple{N,Int64}
  vals::Array{T}
end

function myproduct(in_factors::FactorI{T}...) where T
  in_factors_card = map(x -> collect(size(x.vals)), in_factors)
  out_factor_vars = map(x -> x.vars, in_factors) |> x -> union(x...) |> sort |> Tuple
  in_factors_card_new = map(_ -> ones(Int64, length(out_factor_vars)), in_factors)
  for (i, out_factor_var) in enumerate(out_factor_vars)
    for (j, in_factor_vars) in enumerate(map(x -> x.vars, in_factors))
      out_factor_var in in_factor_vars && (in_factors_card_new[j][i] = popfirst!(in_factors_card[j]))
    end
  end
  in_factors_vals = map(in_factor -> in_factor.vals, in_factors)
  in_factors_vals_new = map((x, y) -> reshape(x, Tuple(y)), in_factors_vals, in_factors_card_new)
  out_factor_card = hcat(in_factors_card_new...) |> x -> maximum(x, dims=2)
  out_factor_new = FactorI{T, length(out_factor_vars)}(out_factor_vars, zeros(out_factor_card...))
  _product!(out_factor_new, in_factors_vals_new)
end

function _product!(out_factor, in_factors_vals)
  out_factor.vals = .*(in_factors_vals...) # FUNCTION BARRIER AND LOOP FUSION!!!
end

A_vars = (2,)
A = FactorI{Float64,1}(A_vars, [0.11; 0.89])

B_vars = (2,4)
B = FactorI{Float64,length(B_vars)}(B_vars, [0.5 0.1; 0.7 0.2])

C_vars = (1,2,3)
C = FactorI{Float64,length(C_vars)}(C_vars, cat([0.25 0.08; 0.05 0.0; 0.15 0.09],
                                                [0.35 0.16; 0.07 0.0; 0.21 0.18], dims=3))

@code_warntype myproduct(A, B, C)
@btime myproduct(A, B, C)

Any other advice as to how I could improve this code would be very much appreciated.

1 Like

Hi,

I’ll confess, I did not study your code that is quite long. But if you are working with type stability, note that Array{T} where{T} is an abstract type: Arrays have two type parameters, element type and number of dimensions. Any code doing X.vals on a variable X of type FactorI{T, N}, is type-unstable.

I wrote a blog about type stability which you might find useful - I welcome any feedback.

7 Likes

Looks very usefull. I will study it slowly. Thanks.

I am not sure how helpful this will be. But note the following:

This is already very type-unstable:

julia> @code_warntype map(x -> collect(size(x.vals)), (A,B,C))
Variables
  #self#::Core.Const(map)
  f::Core.Const(var"#81#82"())
  t::Tuple{FactorI{Float64, 1}, FactorI{Float64, 2}, FactorI{Float64, 3}}

Body::Tuple{Union{Vector{Union{}}, Vector{Int64}}, Union{Vector{Union{}}, Vector{Int64}}, Union{Vector{Union{}}, Vector{Int64}}}
1 ─      nothing
β”‚   %2 = Base.getindex(t, 1)::FactorI{Float64, 1}
β”‚   %3 = (f)(%2)::Union{Vector{Union{}}, Vector{Int64}}
β”‚   %4 = Base.getindex(t, 2)::FactorI{Float64, 2}
β”‚   %5 = (f)(%4)::Union{Vector{Union{}}, Vector{Int64}}
β”‚   %6 = Base.getindex(t, 3)::FactorI{Float64, 3}
β”‚   %7 = (f)(%6)::Union{Vector{Union{}}, Vector{Int64}}
β”‚   %8 = Core.tuple(%3, %5, %7)::Tuple{Union{Vector{Union{}}, Vector{Int64}}, Union{Vector{Union{}}, Vector{Int64}}, Union{Vector{Union{}}, Vector{Int64}}}
└──      return %8


But if you define your structures differently, it becomes stable:

julia> mutable struct FactorI2{R,S}
         vars::R
         vals::S
       end

julia> A2 = FactorI2(A_vars, [0.11; 0.89])
FactorI2{Tuple{Int64}, Vector{Float64}}((2,), [0.11, 0.89])

julia> B2 = FactorI2(B_vars, [0.5 0.1; 0.7 0.2])
FactorI2{Tuple{Int64, Int64}, Matrix{Float64}}((2, 4), [0.5 0.1; 0.7 0.2])

julia> C2 = FactorI2(C_vars, cat([0.25 0.08; 0.05 0.0; 0.15 0.09],
                                [0.35 0.16; 0.07 0.0; 0.21 0.18], dims=3))
FactorI2{Tuple{Int64, Int64, Int64}, Array{Float64, 3}}((1, 2, 3), [0.25 0.08; 0.05 0.0; 0.15 0.09]

[0.35 0.16; 0.07 0.0; 0.21 0.18])

julia> 

julia> @code_warntype map(x -> collect(size(x.vals)), (A2,B2,C2))
Variables
  #self#::Core.Const(map)
  f::Core.Const(var"#85#86"())
  t::Tuple{FactorI2{Tuple{Int64}, Vector{Float64}}, FactorI2{Tuple{Int64, Int64}, Matrix{Float64}}, FactorI2{Tuple{Int64, Int64, Int64}, Array{Float64, 3}}}

Body::Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}
1 ─      nothing
β”‚   %2 = Base.getindex(t, 1)::FactorI2{Tuple{Int64}, Vector{Float64}}
β”‚   %3 = (f)(%2)::Vector{Int64}
β”‚   %4 = Base.getindex(t, 2)::FactorI2{Tuple{Int64, Int64}, Matrix{Float64}}
β”‚   %5 = (f)(%4)::Vector{Int64}
β”‚   %6 = Base.getindex(t, 3)::FactorI2{Tuple{Int64, Int64, Int64}, Array{Float64, 3}}
β”‚   %7 = (f)(%6)::Vector{Int64}
β”‚   %8 = Core.tuple(%3, %5, %7)::Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}
└──      return %8


But not all code β€œjust runs” with that modification.

The code is quite complex to me, but it feels that many of those maps, hcats could be replaced by loops operating over the objects themselves, without relying that much on building tuples all the time. I think that for each of these operations which may return a tuple a of different size and content, the chances of specializing becomes smaller. It is possible that if you provide a small subset of the code, what you have as input and what you want as output, you will get other advice on how to structure all that differently.

1 Like

This looks problematic:

map(f, tuple) |> x -> union(x...) |> sort |> Tuple

The union/unique step is going to be type unstable since Julia’s not going to know how many elements there will be. That’s really the key to using tuples β€” they’ll only be faster if you know what type they’ll be. You’ll end up with faster code just using the array that sort spits out than trying to keep things as tuples.

5 Likes

That’s really the key to using tuples β€” they’ll only be faster if you know what type they’ll be.

Hmm, which isn’t the case in my application: the scope of the resulting factor is the union of the scope of the factor operands, and hence not known upfront. This is a very important design guideline that I ignored while writing this. I’m going to rewrite my code with this in mind and see if I obtain better results. Thanks for all the tips. I will post an update as soon as I write a new version.

I’ve made some progress making the code type stable by following the advise above. There is one section towards the end that I haven’t been able to fix. Here is a minimal example of this part:

using Random; Random.seed!(12)

function myfunc()
  a = (rand(2), rand(2,2), rand(3,2,2))
  b =  [[1, 2, 1, 1], [1, 2, 1, 2], [3, 2, 2, 1]]
  map((x, y) -> reshape(x, Tuple(y)), a, b)
end

@code_warntype myfunc()

Results in:

image

I guess that the way to fix this would be by finding a way to let the compiler know that the type of c should be Vector{Array{Float64, N} where N}. I haven’t managed to do this. N is 4 in this example, which would correspond to the length of the union in the original example I posted above (I’m assuming that this value cannot be notified to the compiler since it depends on the values).

I tried other implementations to see if they somehow fixed the problem (to later try to understand why), but failed in the attempt. These implementations are also type-unstable:

function myfunc()
  a = (rand(2), rand(2,2), rand(3,2,2))
  b =  [[1, 2, 1, 1], [1, 2, 1, 2], [3, 2, 2, 1]]
  [reshape(t[1], Tuple(t[2])) for t in zip(a,b)]
end
function myfunc()
  a = (rand(2), rand(2,2), rand(3,2,2))
  b =  [[1, 2, 1, 1], [1, 2, 1, 2], [3, 2, 2, 1]]
  c = Vector{Array{Float64, 4}}()
  for (x,y) in zip(a,b)
    push!(c, reshape(x, Tuple(y)))
  end
  return c
end

I wish there was a complete university course on type inference: 1/3 of the course to understand the underlying principles, 1/3 to study the corner cases, and 1/3 to practice, practice and practice.

Here you’re taking each element of b β€” Vectors β€” and converting them to Tuples. Vectors don’t store their lengths in the type system. You cannot tell what length(b[1]) is based purely by the typeof(b[1]). That’s the important part.

Change b to be a vector of tuples instead of a vector of vectors and you’ll be in better shape. This is a part that can be figured out from first principles.

Unfortunately, it still doesn’t solve your problem, however, because you’re simultaneously mapping over a, which is a heterogeneous collection of various types of array β€” various dimensionalities β€” and map doesn’t have any tuple smarts in it. Even if it did, though, I’m not sure Julia would be able to figure this sort of pattern out.

1 Like

I still think that we may be facing a XY problem. Maybe if describe you in more global terms what is the type of thing you are trying to achieve someone can point you to a better data/code structure that will be simpler for you and the compiler.

3 Likes

Leandro may be right here, the problem looks ill defined to me. However, if you already know what you get out of the function, why don’t you just figure out the type yourself and annotate it:

julia> function myfunc_annotated()
         a = (rand(2), rand(2,2), rand(3,2,2))
         elt = eltype(a[1])
         b =  [[1, 2, 1, 1], [1, 2, 1, 2], [3, 2, 2, 1]]
         return map((x, y) -> reshape(x, Tuple(y)), a, b)::Vector{Array{elt,N} where N}
       end
myfunc_annotated (generic function with 1 method)

julia> @code_warntype myfunc_annotated()
Variables
  #self#::Core.Const(myfunc_annotated)
  #21::var"#21#22"
  b::Vector{Vector{Int64}}
  elt::Type{Float64}
  a::Tuple{Vector{Float64}, Matrix{Float64}, Array{Float64, 3}}
  N::TypeVar

Body::Vector{Array{Float64, N} where N}
1 ─ %1  = Main.rand(2)::Vector{Float64}
β”‚   %2  = Main.rand(2, 2)::Matrix{Float64}
β”‚   %3  = Main.rand(3, 2, 2)::Array{Float64, 3}
β”‚         (a = Core.tuple(%1, %2, %3))
β”‚   %5  = Base.getindex(a, 1)::Vector{Float64}
β”‚         (elt = Main.eltype(%5))
β”‚   %7  = Base.vect(1, 2, 1, 1)::Vector{Int64}
β”‚   %8  = Base.vect(1, 2, 1, 2)::Vector{Int64}
β”‚   %9  = Base.vect(3, 2, 2, 1)::Vector{Int64}
β”‚         (b = Base.vect(%7, %8, %9))
β”‚         (#21 = %new(Main.:(var"#21#22")))
β”‚   %12 = #21::Core.Const(var"#21#22"())
β”‚   %13 = a::Tuple{Vector{Float64}, Matrix{Float64}, Array{Float64, 3}}
β”‚   %14 = Main.map(%12, %13, b)::Vector{_A} where _A
β”‚   %15 = Core.TypeVar(:N)::Core.Compiler.PartialTypeVar(N, true, true)
β”‚         (N = %15)
β”‚   %17 = N::Core.Compiler.PartialTypeVar(N, true, true)::Core.Compiler.PartialTypeVar(N, true, true)
β”‚   %18 = Core.apply_type(Main.Array, elt::Core.Const(Float64), N::Core.Compiler.PartialTypeVar(N, true, true))::Type{Array{Float64, N}}
β”‚   %19 = Core.UnionAll(%17, %18)::Type{Array{Float64, N} where N}
β”‚   %20 = Core.apply_type(Main.Vector, %19)::Core.Const(Vector{Array{Float64, N} where N})
β”‚   %21 = Core.typeassert(%14, %20)::Vector{Array{Float64, N} where N}
└──       return %21

Or is this not really what you intended for?

I’ll follow your suggestion and take a step back.

I’m currently involved in a field called probabilistic graphical models. Within this field, I’m focusing on problems that involve discrete random variables (variables that can take a finite number of values). A probabilistic graphical model is defined by a set of factors that represent the relationship of the different random variables with each other. Such models can be used to answer interesting questions such as what is the probability that something occurs given that a subset of the variables is observed.

A factor consists of a table of numbers for each instantiation of the random variables involved in that factor, e.g. this is a factor that is a function of 2 binary random variables A and B:
image

In other words, each factor is a function of a given set of variables in the model, and each of these variables has a given cardinality (number of states). The table size is the multiplication of the cardinality of each of the random variables in the factor.

At this moment, I’m trying to implement an algorithm called Variable Elimination which mainly consists of performing products and marginalizations between the factors. Here is an example of a factor product between 2 factors:

Such factor product generalizes to N factors, where the resulting scope is the union of the scope of the factor operands.

and here is an example of a factor marginalization:

which corresponds to adding the values that correspond to all possible instantiations of the variables that are being marginalized (summed out), which is one (B) in this case.

The function that I shared above and which I’m trying to optimize corresponds to the factor product of N factors. It reshapes the operand factor tables to a common size by adding 1-sized dimensions for those variables that are not present in a given operand but that are present in the resulting scope (the union of variables of all operand factors). This allows me to perform a broadcast multiplication of the reshaped tables to obtain the result.

The problems that I’m focusing on have variables with low cardinalities (2 or 3) but the factors can grow quite large, e.g. a factor with 25 binary variables (33554432 states). The factor products and marginalizations described above are computed many times, hence the importance to optimize these two operations.

My overall goal is to see if I can compete (performance-wise) with the state of the art libraries for this type of problems which are mostly available in C++. My thought is that a Julia implementation could open very interesting possibilities for optimization by making use of the metaprogramming capabilities.

4 Likes

This sounds a lot like Einstein summation, and I know that there are Julia packages for that (although I haven’t used them myself).

I had a discussion with @mcabbott about the available Julia packages to perform these operations efficiently. He helped me converge to this approach (of performing factor products) after trying out Tullio.jl and analyzing other related packages.

I pretty sure I will not be very helpful here. But here is a function that does the first computation with a very different approach. It is also type-unstable, so it does not solve your original problem. The only nice thing is that it produces a pretty table :slight_smile: . Anyway, maybe this can inspire you or someone else in some other direction.

From what you describe, maybe studying how those C++ alternatives are structured is the way to go.

Run example with:

using DataFrames

d1 = DataFrame( A = [ 1, 1, 2, 2, 3, 3 ],
                B = [ 1, 2, 1, 2, 1, 2 ],
                Ο• = [ 0.5, 0.8, 0.1, 0, 0.3, 0.9 ] )

d2 = DataFrame( B = [ 1, 1, 2, 2 ],
                C = [ 1, 2, 1, 2 ],
                Ο• = [ 0.5, 0.7, 0.1, 0.2 ] )

julia> factor_product(d1,d2,:B)
12Γ—4 DataFrame
 Row β”‚ A      B      C      Ο•       
     β”‚ Int64  Int64  Int64  Float64 
─────┼──────────────────────────────
   1 β”‚     1      1      1     0.25
   2 β”‚     1      1      2     0.35
   3 β”‚     1      2      1     0.08
   4 β”‚     1      2      2     0.16
   5 β”‚     2      1      1     0.05
   6 β”‚     2      1      2     0.07
   7 β”‚     2      2      1     0.0
   8 β”‚     2      2      2     0.0
   9 β”‚     3      1      1     0.15
  10 β”‚     3      1      2     0.21
  11 β”‚     3      2      1     0.09
  12 β”‚     3      2      2     0.18


The function is:

function factor_product(d1,d2,factor_name)
  # find factor
  d1_col = findfirst(isequal(factor_name),propertynames(d1))
  d2_col = findfirst(isequal(factor_name),propertynames(d2))
  # build output df
  factor_prods = DataFrame()
  for name in propertynames(d1)[1:end-1]
    factor_prods[!,name] = Int[]
  end
  for name in propertynames(d2)[1:end-1]
    if name != factor_name
      factor_prods[!,name] = Int[]
    end
  end
  factor_prods[!,:Ο•] = Float64[]
  # Fill up factor products
  for (i1,f1) in pairs(d1[!,d1_col]),
      (i2,f2) in pairs(d2[!,d2_col])
    if f1 == f2
      rowdata = Union{Int,Float64}[]
      for factor in d1[i1,1:end-1]
        push!(rowdata,factor)
      end
      for (name,factor) in pairs(d2[i2,1:end-1])
        if name != factor_name
          push!(rowdata,factor)
        end
      end
      fp = d1[i1,end]*d2[i2,end]
      push!(rowdata,fp)
      push!(factor_prods,rowdata)
    end
  end
  factor_prods
end

Now this is a version that does the same thing as the code above (the factor product from your example), and is completely type-stable. It is 20x faster than the version above (I would be curious how this compares with your implementation, but really I don’t want to take your time):

struct FactorTable
  names::Vector{Symbol} 
  f::Matrix{Int}
  Ο•::Vector{Float64}
end

nrow(d::FactorTable) = size(d.f,1)
ncol(d::FactorTable) = size(d.f,2)

function factor_product(d1::FactorTable,d2::FactorTable,common_factor::Symbol)
   # remaning factors
   ncols = ncol(d1) + ncol(d2) - 2
   # size of output matrix
   d1_col = findfirst(isequal(common_factor),d1.names)
   d2_col = findfirst(isequal(common_factor),d2.names)
   nrows = 0
   for i1 in d1.f[:,d1_col], i2 in d2.f[:,d2_col]
     (i1 == i2) && (nrows += 1)
   end
   # Build output factor table
   out = FactorTable(Vector{Symbol}(undef,ncols), 
                     Matrix{Int}(undef,nrows,ncols),
                     Vector{Float64}(undef,nrows))
   # Fill names vector 
   ii = 0
   for f in d1.names
     if f != common_factor
       ii+=1
       out.names[ii] = f
     end
   end
   for f in d2.names
     if f != common_factor
       ii+=1
       out.names[ii] = f
     end
   end
   # Fill the output table
   irow = 0
   for (i1,f1) in pairs(d1.f[:,d1_col]), 
       (i2,f2) in pairs(d2.f[:,d2_col])
     if f1 == f2
       irow += 1
       icol = 0
       for (i,f) in pairs(d1.f[i1,:])
         if i != d1_col 
           icol += 1
           out.f[irow,icol] = f
         end
       end
       for (i,f) in pairs(d2.f[i2,:])
         if i != d2_col
           icol += 1
           out.f[irow,icol] = f
         end
       end
       out.Ο•[irow] = d1.Ο•[i1]*d2.Ο•[i2]
     end
   end
   out
end

d1 = FactorTable( [ :a, :b ],
                  [  1   1 
                     1   2
                     2   1 
                     2   2 
                     3   1
                     3   2 ],
                  [ 0.5, 0.8, 0.1, 0.0, 0.3, 0.9 ] 
                )


d2 = FactorTable( [ :b, :c ],
                  [  1   1 
                     1   2
                     2   1
                     2   2 ],
                  [ 0.5, 0.7, 0.1, 0.2 ] 
                )

using PrettyTables
p = factor_product(d1,d2,:b)
pretty_table(hcat(p.f,p.Ο•))

using BenchmarkTools
@btime factor_product($d1,$d2,:b)
 
                       

output:

julia> include("./test.jl")
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Col. 1 β”‚ Col. 2 β”‚ Col. 3 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚    1.0 β”‚    1.0 β”‚   0.25 β”‚
β”‚    1.0 β”‚    2.0 β”‚   0.35 β”‚
β”‚    1.0 β”‚    1.0 β”‚   0.08 β”‚
β”‚    1.0 β”‚    2.0 β”‚   0.16 β”‚
β”‚    2.0 β”‚    1.0 β”‚   0.05 β”‚
β”‚    2.0 β”‚    2.0 β”‚   0.07 β”‚
β”‚    2.0 β”‚    1.0 β”‚    0.0 β”‚
β”‚    2.0 β”‚    2.0 β”‚    0.0 β”‚
β”‚    3.0 β”‚    1.0 β”‚   0.15 β”‚
β”‚    3.0 β”‚    2.0 β”‚   0.21 β”‚
β”‚    3.0 β”‚    1.0 β”‚   0.09 β”‚
β”‚    3.0 β”‚    2.0 β”‚   0.18 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1.263 ΞΌs (41 allocations: 4.34 KiB)
FactorTable([:a, :c], [1 1; 1 2; … ; 3 1; 3 2], [0.25, 0.35, 0.08000000000000002, 0.16000000000000003, 0.05, 0.06999999999999999, 0.0, 0.0, 0.15, 0.21, 0.09000000000000001, 0.18000000000000002])


The fundamental issue in your code is that you cannot use Julia’s multidimensional arrays machinery for this problem because you do not know the number of dimensions at compile time. Instead, you have to reinvent the wheel to some extent and write a multidimensional arrays code which can deal with a number of dimensions specified only at runtime. Here’s my stab at this. It definitely lose out to @lmiq in terms of output prettiness, but the implementation of the factor product should be decent. There’s still some room for improvement, but I’d say we can get into that at a later point.

using LinearAlgebra
using SparseArrays

# Like CartesianIndices, but with number of dimensions specified at runtime
struct RuntimeCartesianIndices{T <: AbstractVector{<:Integer}}
    dims::T
end

Base.IteratorSize(::Type{RuntimeCartesianIndices}) = Base.HasLength()
Base.IteratorEltype(::Type{RuntimeCartesianIndices}) = Base.HasEltype()
Base.eltype(::Type{RuntimeCartesianIndices}) = Vector{Int}
Base.length(I::RuntimeCartesianIndices) = prod(I.dims)

function Base.iterate(I::RuntimeCartesianIndices)
    if mapreduce(iszero, |, I.dims)
        return nothing
    end
    i = ones(Int,length(I.dims))
    return i,i
end

function Base.iterate(I::RuntimeCartesianIndices, i)
    for k = 1:length(I.dims)
        if i[k] < I.dims[k]
            i[k] += 1
            return i,i
        end
        i[k] = 1
    end
    return nothing
end




struct Factor{T}
    vars::Vector{Any}
    dims::Vector{Int}
    vals::Vector{T}
end

Base.eltype(::Type{Factor{T}}) where {T} = T

function factor_product(a::Factor, b::Factor)
    ##################################
    # Merge `vars` and `dims` vectors

    cvars = copy(a.vars)
    cdims = copy(a.dims)

    # Current stride in `b`
    sb = 1
    # Strides of the `a` indices in `b`
    sba = spzeros(Int, length(a.vars))
    # Strides of the tail indices in `b`
    sbt = Int[]
    # ^ These variables are needed for multiplying the correct `vals` later

    for i in 1:length(b.dims)
        j = findfirst(isequal(b.vars[i]), a.vars)
        if isnothing(j)
            push!(cvars, b.vars[i])
            push!(cdims, b.dims[i])
            push!(sbt,sb)
        else
            @assert a.dims[j] == b.dims[i]
            sba[j] = sb
        end
        sb *= b.dims[i]
    end

    ######################
    # Multiply the `vals`

    T = typeof(zero(eltype(a)) * zero(eltype(b)))
    cvals = Vector{T}(undef, prod(cdims))
    for (lt,it) = enumerate(RuntimeCartesianIndices(@view(cdims[length(a.dims)+1:end])))
        lt = prod(a.dims) * (lt-1)
        lbt = dot(sbt,it) - sum(sbt)
        for (la,ia) in enumerate(RuntimeCartesianIndices(a.dims))
            lba = dot(sba,ia) - sum(sba) + 1
            cvals[la+lt] = a.vals[la] * b.vals[lbt + lba]
        end
    end
    return Factor(cvars,cdims,cvals)
end

function test()
    a = Factor(
        Any[:A,:B],
        [3,2],
        vec([
            0.5 0.8
            0.1 0.0
            0.3 0.9
        ])
    )
    b = Factor(
        Any[:B,:C],
        [2,2],
        vec([
            0.5 0.7
            0.1 0.2
        ])
    )
    c = factor_product(a,b)

    # Produces exactly the number in your example
    println(vec(permutedims(reshape(c.vals,c.dims...), (3,2,1))))
end
3 Likes

That is ~60% faster than my last alternative, but I am not completely sure we are doing exactly the same thing (I more or less deduced what was the algorithm from the drawing above, I think you know better what you are doing and I may be storing and testing unnecessary stuff there associated to the labels of the variables in each column).

I am also a bit reluctant in using all those low-level (?) iterator features, it makes (for me) the code too hard to understand.

Thank you very much for the responses guys. Finally, I have time to look into this again and study the code you shared with me. Both of the approaches you present are quite new to me. I’ll take some time to study them.

3 Likes

Your implementation is 10 times (or more) faster than the initial one I posted, which is awesome. On the other hand, it does make me realize that there is a whole world of Julia, specifically in the arrays domain, that I still need to explore and understand to become good at writing Julia performant code. I have several specific questions about your implementation but I will postpone them for later. First, I want to see if I can make the entire algorithm work with your implementation and then I’ll dive into the specifics.

There is another idea I was thinking about to improve the performance. I would like to hear your thoughts about it. I noticed that Static Arrays can improve performance quite a lot if the array sizes are small. Would it be a good idea to implement some logic that uses static arrays for arrays of length < 100 and othewrise normal arrays? Is this something that is done commonly in Julia packages seeking for the last bit of performance?

1 Like

Would it be a good idea to implement some logic that uses static arrays for arrays of length < 100 and otherwise normal arrays?

No, that’s probably a bad idea. The main application for static arrays is if the size is small and known at compile time. A prime example would be a particle simulation where you know a-priori that all your position and velocity vectors will be of length 3. If the size of your arrays can change over time, then plain-old Vector will be best.

2 Likes