Matrix performance

bdeonovic · August 23, 2017, 2:45pm

I have two loops and one is significantly faster, although they use similar constructs. I can’t figure out how to speed up the second one:

 ## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

    ## Backward recursion
    beta[T,: ] = ones(N)
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

The first loop takes ~1.2s and the second one takes ~32s when run in global scope REPL. Why is the second one so much slower?

bdeonovic · August 23, 2017, 3:01pm

Instead of using .* i changed to diagm and got significant improvement

 ## Backward recursion
    beta[T,: ] = ones(N)
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
      #beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

sdanisch · August 23, 2017, 3:04pm

So you’re not wrapping it in a function? That’s the first thing the performance tips advice against
After putting it in a function, some cheap performance gains could be achieved by sprinkling @view in front of some of the alpah[t, :] = ... calls

bdeonovic · August 23, 2017, 3:04pm

It is in a function thats not the point I’m just trying to optimize the second loop. I was just testing the timing of each in REPL. What does @view do?

sdanisch · August 23, 2017, 3:08pm

help?> @view
  @view A[inds...]

  Creates a SubArray from an indexing expression. This can only be applied
  directly to a reference expression (e.g. @view A[1,2:end]), and should not
  be used as the target of an assignment (e.g. @view(A[1,2:end]) = ...). See
  also @views to switch an entire block of code to use views for slicing.

  julia> A = [1 2; 3 4]
  2×2 Array{Int64,2}:
   1  2
   3  4
  
  julia> b = @view A[:, 1]
  2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
   1
   3
  
  julia> fill!(b, 0)
  2-element SubArray{Int64,1,Array{Int64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}:
   0
   0
  
  julia> A
  2×2 Array{Int64,2}:
   0  2
   0  4

Learned something here as well: it should have been @views not @view

kristoffer.carlsson · August 23, 2017, 3:10pm

If you give code that people can copy paste and experiment with, you are likely to get more answers. Having to figure out all the arguments, and dimensions is too painful and running code in the head can be missleading.

bdeonovic · August 23, 2017, 3:15pm

setting up MWE

using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
      g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
      s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
      B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
    end

Okay the loops I am interested in:

## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

    ## Backward recursion
    beta[T,: ] = c[T]
    @inbounds for t in (T-1):-1:1
      beta[t,: ] = diagm(beta[t+1,: ]) * (A * B[t+1,: ]) * diagm(c[t])
      #beta[t,: ] = (A * B[t+1,: ]) .* beta[t+1,: ] .* c[t]
    end

I now have similar performance in both loops of interest by using diagm rather than .* can someone explain to me why?

kristoffer.carlsson · August 23, 2017, 3:21pm

Can you try remove everything that is not relevant to the loop you are asking about, put that in a function and give a call to that function?

kristoffer.carlsson · August 23, 2017, 3:35pm

If the dimensions used here is representative for your problem you could consider StaticArrays with something like:

T = 1000000
A = @SMatrix [.3 .7; .2 .8]
p = @SVector [.8, .2]
beta = fill(zeros(SVector{2, Float64}), T)
c = zeros(T)
B = copy(beta)

using StaticArrays

function f(beta, c, A, B, T)
    beta[T] = c[T] * ones(eltype(beta))
    @inbounds for t in (T-1):-1:1
        beta[t] = beta[t+1] .* (A * B[t+1]) * c[t]
    end
end

which runs in 0.007 seconds. Not sure the translation is 100% correct but it should do the same amount of work.

bdeonovic · August 23, 2017, 3:37pm

What are StaticArrays,when should they be used and where can i learn more about them?

kristoffer.carlsson · August 23, 2017, 3:39pm

Use them when you have low dimensional arrays (like coordinates in space). They are awesome and will give tremendous speed increase for dimensions like in your problem (make sure the code is type stable though).

bdeonovic · August 23, 2017, 3:43pm

so static meaning the size of array is assumed to not change?

sdanisch · August 23, 2017, 3:44pm

Yes!

bdeonovic · August 23, 2017, 4:23pm

@kristoffer.carlsson thanks for the help. It seemed like I was getting the wrong results using my loop and so I wrote it out in straight for loops (rather than using matrix multiplication) can you explain why the matrix method is resulting in increasing errors?

## Backward recursion                                           
 beta = zeros(T, N)                                                   
 beta[T,: ] = c[T]                                                    
 @inbounds for t in (T-1):-1:1                                        
   beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
 end

the following code should be equivalent but doesn’t accumulate errors

    beta = zeros(T, N)
    beta[T,: ] = c[T]
    for t in (T-1):-1:1
      for i in 1:N
        for j in 1:N
          beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
        end
      end
      beta[t,: ] *= c[t]
    end

tshort · August 23, 2017, 4:34pm

Please provide a working example that can be copy/pasted into Julia. The post from @kristoffer.carlsson is a good example. With Kristoffer’s example, I can copy and paste the section of code, and it runs (after moving the using StaticArrays line to the top).

bdeonovic · August 23, 2017, 4:36pm

MWE set

using Distributions
n=1000000
qidx = rand(1:100, n)
x = rand([0,1], n)
theta = randn()
gamma=randn(2)
d = randn(100)
A = [.3 .7; .2 .8]
p=[.8, .2]

N = length(p)
M = length(gamma)
T = length(x)
alpha = zeros(T, N)
beta = zeros(T, N)
c = zeros(T)
B = zeros(T, N)

@inbounds for t in 1:T
      g = cdf(Logistic(d[qidx[t]] - gamma[1]), theta)
      s = cdf(Logistic(theta - gamma[2]), d[qidx[t]])
      B[t,: ] = x[t] == 1 ? [1.0 - s, g] : [s, 1.0 - g]
    end

## Forward recursion
    alpha[1,: ] = p .* B[1,: ]
    c[1] = 1.0 / sum(alpha[1,: ])
    alpha[1,: ] *= c[1]

    @inbounds for t in 2:T
      alpha[t,: ] = (alpha[t-1,: ]' * A) * diagm(B[t,: ])
      c[t] = 1.0 / sum(alpha[t,: ])
      alpha[t,: ] *= c[t]
    end
    log_likelihood = -sum(log.(c))

Now the following two loops should be equivalent from a mathematical perspective but I am getting different results

## Backward recursion                                           
 beta = zeros(T, N)                                                   
 beta[T,: ] = c[T]                                                    
 @inbounds for t in (T-1):-1:1                                        
   beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
 end

vs just using for loops

    beta = zeros(T, N)
    beta[T,: ] = c[T]
    for t in (T-1):-1:1
      for i in 1:N
        for j in 1:N
          beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
        end
      end
      beta[t,: ] *= c[t]
    end

kristoffer.carlsson · August 23, 2017, 5:21pm

You seem to misunderstand minimum here. Minimum means strip away everything that is not relevant to the problem, every single line should have a purpose for the question you are asking.

This is a minimum example:

T = 3
c = rand(T)
B = rand(T, 2)
A = rand(2, 2)

function f1(c, T, A, B)
   beta = zeros(T, N)
   beta[T,: ] = c[T]
   for t in (T-1):-1:1
       for i in 1:N
           for j in 1:N
               beta[t,i] += A[i,j] * B[t+1, j] * beta[t+1, j]
           end
       end
       beta[t,: ] *= c[t]
   end
   return beta
end

function f2(c, T, A, B)
   beta = zeros(T, N)
   beta[T,: ] = c[T]
   for t in (T-1):-1:1
       beta[t,: ] .= (Diagonal(beta[t+1,: ]) * (A * B[t+1,: ])) .* c[t]   
   end
   return beta
end

r1 = f1(c, T, A, B)
r2 = f2(c, T, A, B)

kristoffer.carlsson · August 23, 2017, 5:40pm

It seems you believe the following is equivalent:

A = rand(2,2); b = rand(2); β = rand(2);

# 1
beta1 = β .* A * b

# 2
beta2 = zeros(2);
for i in 1:N
    for j in 1:N
        beta2[i] += β[j] * A[i,j] * b[j]
    end
end
beta2

They are not. The β[j] is wrong and should be β[i].
The corresponding change in your code is beta[t+1, j] → beta[t+1, i]

bdeonovic · August 23, 2017, 5:42pm

Ah thanks @kristoffer.carlsson ! What would the corresponding change be to the matrix notation if I actually wanted the result as is written with teh for loops.

kristoffer.carlsson · August 23, 2017, 5:45pm

A * (β .* b)