# Matrix performance

I have two loops and one is significantly faster, although they use similar constructs. I can’t figure out how to speed up the second one:

`````` ## Forward recursion
alpha[1,: ] = p .* B[1,: ]
c[1] = 1.0 / sum(alpha[1,: ])
alpha[1,: ] *= c[1]

@inbounds for t in 2:T
alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
c[t] = 1.0 / sum(alpha[t,: ])
alpha[t,: ] *= c[t]
end
log_likelihood = -sum(log.(c))

## Backward recursion
beta[T,: ] = ones(N)
@inbounds for t in (T-1):-1:1
beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
end
``````

The first loop takes ~1.2s and the second one takes ~32s when run in global scope REPL. Why is the second one so much slower?

Instead of using `.*` i changed to `diagm` and got significant improvement

`````` ## Backward recursion
beta[T,: ] = ones(N)
@inbounds for t in (T-1):-1:1
beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
#beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
end
``````

So you’re not wrapping it in a function? That’s the first thing the performance tips advice against
After putting it in a function, some cheap performance gains could be achieved by sprinkling `@view` in front of some of the `alpah[t, :] = ...` calls

It is in a function thats not the point I’m just trying to optimize the second loop. I was just testing the timing of each in REPL. What does @view do?

``````help?> @view
@view A[inds...]

Creates a SubArray from an indexing expression. This can only be applied
directly to a reference expression (e.g. @view A[1,2:end]), and should not
be used as the target of an assignment (e.g. @view(A[1,2:end]) = ...). See
also @views to switch an entire block of code to use views for slicing.

julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1  2
3  4

julia> b = @view A[:, 1]
2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
1
3

julia> fill!(b, 0)
2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
0
0

julia> A
2×2 Array{Int64,2}:
0  2
0  4
``````

Learned something here as well: it should have been `@views` not `@view`

If you give code that people can copy paste and experiment with, you are likely to get more answers. Having to figure out all the arguments, and dimensions is too painful and running code in the head can be missleading.

4 Likes

setting up MWE

``````using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
end
``````

Okay the loops I am interested in:

``````## Forward recursion
alpha[1,: ] = p .* B[1,: ]
c[1] = 1.0 / sum(alpha[1,: ])
alpha[1,: ] *= c[1]

@inbounds for t in 2:T
alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
c[t] = 1.0 / sum(alpha[t,: ])
alpha[t,: ] *= c[t]
end
log_likelihood = -sum(log.(c))

## Backward recursion
beta[T,: ] = c[T]
@inbounds for t in (T-1):-1:1
beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
#beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
end
``````

I now have similar performance in both loops of interest by using `diagm` rather than `.*` can someone explain to me why?

Can you try remove everything that is not relevant to the loop you are asking about, put that in a function and give a call to that function?

1 Like

If the dimensions used here is representative for your problem you could consider StaticArrays with something like:

``````T = 1000000
A = @SMatrix [.3 .7; .2 .8]
p = @SVector [.8, .2]
beta = fill(zeros(SVector{2, Float64}), T)
c = zeros(T)
B = copy(beta)

using StaticArrays

function f(beta, c, A, B, T)
beta[T] = c[T] * ones(eltype(beta))
@inbounds for t in (T-1):-1:1
beta[t] = beta[t+1] .* (A * B[t+1]) * c[t]
end
end
``````

which runs in 0.007 seconds. Not sure the translation is 100% correct but it should do the same amount of work.

Use them when you have low dimensional arrays (like coordinates in space). They are awesome and will give tremendous speed increase for dimensions like in your problem (make sure the code is type stable though).

2 Likes

so static meaning the size of array is assumed to not change?

Yes!

@kristoffer.carlsson thanks for the help. It seemed like I was getting the wrong results using my loop and so I wrote it out in straight for loops (rather than using matrix multiplication) can you explain why the matrix method is resulting in increasing errors?

``````## Backward recursion
beta = zeros(T, N)
beta[T,: ] = c[T]
@inbounds for t in (T-1):-1:1
beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]
end
``````

the following code should be equivalent but doesn’t accumulate errors

``````    beta = zeros(T, N)
beta[T,: ] = c[T]
for t in (T-1):-1:1
for i in 1:N
for j in 1:N
beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
end
end
beta[t,: ] *= c[t]
end
``````

Please provide a working example that can be copy/pasted into Julia. The post from @kristoffer.carlsson is a good example. With Kristoffer’s example, I can copy and paste the section of code, and it runs (after moving the `using StaticArrays` line to the top).

MWE set

``````using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
end

## Forward recursion
alpha[1,: ] = p .* B[1,: ]
c[1] = 1.0 / sum(alpha[1,: ])
alpha[1,: ] *= c[1]

@inbounds for t in 2:T
alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
c[t] = 1.0 / sum(alpha[t,: ])
alpha[t,: ] *= c[t]
end
log_likelihood = -sum(log.(c))
``````

Now the following two loops should be equivalent from a mathematical perspective but I am getting different results

``````## Backward recursion
beta = zeros(T, N)
beta[T,: ] = c[T]
@inbounds for t in (T-1):-1:1
beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]
end
``````

vs just using for loops

``````    beta = zeros(T, N)
beta[T,: ] = c[T]
for t in (T-1):-1:1
for i in 1:N
for j in 1:N
beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
end
end
beta[t,: ] *= c[t]
end
``````

You seem to misunderstand minimum here. Minimum means strip away everything that is not relevant to the problem, every single line should have a purpose for the question you are asking.

This is a minimum example:

``````T = 3
c = rand(T)
B = rand(T, 2)
A = rand(2, 2)

function f1(c, T, A, B)
beta = zeros(T, N)
beta[T,: ] = c[T]
for t in (T-1):-1:1
for i in 1:N
for j in 1:N
beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
end
end
beta[t,: ] *= c[t]
end
return beta
end

function f2(c, T, A, B)
beta = zeros(T, N)
beta[T,: ] = c[T]
for t in (T-1):-1:1
beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]
end
return beta
end

r1 = f1(c, T, A, B)
r2 = f2(c, T, A, B)
``````
3 Likes

It seems you believe the following is equivalent:

``````A = rand(2,2); b = rand(2); β = rand(2);

# 1
beta1 = β .* A * b

# 2
beta2 = zeros(2);
for i in 1:N
for j in 1:N
beta2[i] += β[j] * A[i,j] * b[j]
end
end
beta2
``````

They are not. The `β[j]` is wrong and should be `β[i]`.
The corresponding change in your code is `beta[t+1, j]``beta[t+1, i]`

Ah thanks @kristoffer.carlsson ! What would the corresponding change be to the matrix notation if I actually wanted the result as is written with teh for loops.

`A * (β .* b)`