Matrix performance

I have two loops and one is significantly faster, although they use similar constructs. I can’t figure out how to speed up the second one:

 ## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

    ## Backward recursion
    beta[T,: ] = ones(N)
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

The first loop takes ~1.2s and the second one takes ~32s when run in global scope REPL. Why is the second one so much slower?

Instead of using .* i changed to diagm and got significant improvement

 ## Backward recursion
    beta[T,: ] = ones(N)
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
      #beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

So you’re not wrapping it in a function? That’s the first thing the performance tips advice against :wink:
After putting it in a function, some cheap performance gains could be achieved by sprinkling @view in front of some of the alpah[t, :] = ... calls

It is in a function thats not the point I’m just trying to optimize the second loop. I was just testing the timing of each in REPL. What does @view do?

help?> @view
  @view A[inds...]

  Creates a SubArray from an indexing expression. This can only be applied
  directly to a reference expression (e.g. @view A[1,2:end]), and should not
  be used as the target of an assignment (e.g. @view(A[1,2:end]) = ...). See
  also @views to switch an entire block of code to use views for slicing.

  julia> A = [1 2; 3 4]
  2×2 Array{Int64,2}:
   1  2
   3  4
  
  julia> b = @view A[:, 1]
  2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
   1
   3
  
  julia> fill!(b, 0)
  2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
   0
   0
  
  julia> A
  2×2 Array{Int64,2}:
   0  2
   0  4

Learned something here as well: it should have been @views not @view :slight_smile:

If you give code that people can copy paste and experiment with, you are likely to get more answers. Having to figure out all the arguments, and dimensions is too painful and running code in the head can be missleading.

4 Likes

setting up MWE

using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
      g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
      s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
      B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
    end

Okay the loops I am interested in:

## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

    ## Backward recursion
    beta[T,: ] = c[T]
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
      #beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

I now have similar performance in both loops of interest by using diagm rather than .* can someone explain to me why?

Can you try remove everything that is not relevant to the loop you are asking about, put that in a function and give a call to that function?

1 Like

If the dimensions used here is representative for your problem you could consider StaticArrays with something like:

T = 1000000
A = @SMatrix [.3 .7; .2 .8]
p = @SVector [.8, .2]
beta = fill(zeros(SVector{2, Float64}), T)
c = zeros(T)
B = copy(beta)

using StaticArrays

function f(beta, c, A, B, T)
    beta[T] = c[T] * ones(eltype(beta))
    @inbounds for t in (T-1):-1:1
        beta[t] = beta[t+1] .* (A * B[t+1]) * c[t]
    end
end

which runs in 0.007 seconds. Not sure the translation is 100% correct but it should do the same amount of work.

What are StaticArrays,when should they be used and where can i learn more about them?

Use them when you have low dimensional arrays (like coordinates in space). They are awesome and will give tremendous speed increase for dimensions like in your problem (make sure the code is type stable though).

2 Likes

so static meaning the size of array is assumed to not change?

Yes!

@kristoffer.carlsson thanks for the help. It seemed like I was getting the wrong results using my loop and so I wrote it out in straight for loops (rather than using matrix multiplication) can you explain why the matrix method is resulting in increasing errors?

## Backward recursion                                           
 beta = zeros(T, N)                                                   
 beta[T,: ] = c[T]                                                    
 @inbounds for t in (T-1):-1:1                                        
   beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
 end                                                                  

the following code should be equivalent but doesn’t accumulate errors

    beta = zeros(T, N)
    beta[T,: ] = c[T]
    for t in (T-1):-1:1
      for i in 1:N
        for j in 1:N
          beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
        end
      end
      beta[t,: ] *= c[t]
    end

Please provide a working example that can be copy/pasted into Julia. The post from @kristoffer.carlsson is a good example. With Kristoffer’s example, I can copy and paste the section of code, and it runs (after moving the using StaticArrays line to the top).

MWE set

using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
      g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
      s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
      B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
    end

## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

Now the following two loops should be equivalent from a mathematical perspective but I am getting different results

## Backward recursion                                           
 beta = zeros(T, N)                                                   
 beta[T,: ] = c[T]                                                    
 @inbounds for t in (T-1):-1:1                                        
   beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
 end          

vs just using for loops

    beta = zeros(T, N)
    beta[T,: ] = c[T]
    for t in (T-1):-1:1
      for i in 1:N
        for j in 1:N
          beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
        end
      end
      beta[t,: ] *= c[t]
    end

You seem to misunderstand minimum here. Minimum means strip away everything that is not relevant to the problem, every single line should have a purpose for the question you are asking.

This is a minimum example:

T = 3
c = rand(T)
B = rand(T, 2)
A = rand(2, 2)

function f1(c, T, A, B)
   beta = zeros(T, N)
   beta[T,: ] = c[T]
   for t in (T-1):-1:1
       for i in 1:N
           for j in 1:N
               beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
           end
       end
       beta[t,: ] *= c[t]
   end
   return beta
end

function f2(c, T, A, B)
   beta = zeros(T, N)
   beta[T,: ] = c[T]
   for t in (T-1):-1:1
       beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
   end
   return beta
end

r1 = f1(c, T, A, B)
r2 = f2(c, T, A, B)
3 Likes

It seems you believe the following is equivalent:

A = rand(2,2); b = rand(2); β = rand(2);

# 1
beta1 = β .* A * b

# 2
beta2 = zeros(2);
for i in 1:N
    for j in 1:N
        beta2[i] += β[j] * A[i,j] * b[j]
    end
end
beta2

They are not. The β[j] is wrong and should be β[i].
The corresponding change in your code is beta[t+1, j]beta[t+1, i]

Ah thanks @kristoffer.carlsson ! What would the corresponding change be to the matrix notation if I actually wanted the result as is written with teh for loops.

A * (β .* b)