Accumulating error in for loop generated by tensor macro vs sum


Could anyone give me a pointer on a numerical inaccuracy that’s causing my simulation to explode? Here’s a toy example of the operation I’m running, though in practice it accumulates as b += S(C), where S is a nonlinear function.

using TensorOperations

A = rand(9,8,7,6)
b = rand(8,6)

# Calculate C as a tensor operation
@tensor C_tensor[to,i] := A[to,from,i,j] * b[from,j]

# Calculate C as a sum-of-matrix-multiplications operation
C_loop = Array{Float64}(undef, 9,7)
for i in 1:7
  C_loop[:,i] .= sum(A[:,:,i,j] * b[:,j] for j in 1:6)

all(C_tensor .== C_loop) # false
all(C_tensor .≈ C_loop) # true

For just this one step, the difference is very small. However, while the loop code converges to a correct solution in my diffeq system, the tensor/loop code accumulates errors so that the “solution” explodes. Does anyone have any idea what might be the problem with the tensor approach?

TensorOperations doesn’t expand super decipherably, but another tensor contraction library (Einsum) gives the following, which also explodes in exactly the same way:

for i = 1:min(size(C_tensor, 2), size(A, 3))
  for to = 1:min(size(C_tensor, 1), size(A, 1))
    accum = 0
    for j = 1:size(A, 4)
      for from = 1:size(A, 2)
        accum += A[to, from, i, j] * b[from, j]
    C_tensor[to, i] = accum


The sum function uses pairwise summation, which is more accurate than the simple loop used in your tensor contraction.

However, if what you are doing is this sensitive to small roundoff errors, you might think about how to reformulate your overall approach (i.e. what are you using these sums for?) in a more numerically stable way.

1 Like