Improving Speed for a Novice

Donald_Williams · May 14, 2018, 4:26pm

Hi:
I am mostly a R user, but am currently developing a package that has the primary functions written in Julia. This package is for estimating very large (inverse) co-variance matrices. I have considerable speed gains compared to R, already, but as a novice I was looking for some input to make the Julia code even more efficient.

function col_means(x)
    mean(x, 1)
end

function center_data(x)
    x .- col_means(x)
end

function post_prob(x)
    grt_zero = mean(repeat([0], inner = size(x, 1)) .> x)
    ls_zero =  1 - grt_zero
    1 - min(ls_zero, grt_zero)
end

function BB_est(x, sims, prob)
    X = center_data(x)
    n = size(X, 1)
    p = size(X, 2)
    prob = convert(Float64, prob)
    offdiag = (p * (p - 1)) / 2

    loop_max = convert(Int64, offdiag)

    temp = Exponential(1)
    sims= convert(Int64, sims)
    exp_drw = rand(temp, sims, n)

    dir_weights = exp_drw ./sum(exp_drw, 2)

   A = zeros(sims, offdiag)
   B = zeros((1, offdiag))
   C = zeros(sims, p)
   mat_point = zeros(p,p)
   mat_sig = zeros(p, p)
   idx = tril(trues(size(mat_point)), -1);

for i = 1:sims
    t = dir_weights[i, 1:n]
    d = X .* sqrt.(t)
    bb_cov = (d' * d)
    inv_cov =  inv(bb_cov)
    d = diag(inv_cov)
    inv_cov[diagind(inv_cov)] = 0

    test = inv_cov[idx]
    rmv_0 = filter!(x->x≠ 0,test)
    A[i,:] = rmv_0
    C[i,:] = d
end

for i in 1:loop_max
    temp1 = post_prob(A[:,i])
    B[1,i] = ifelse(temp1 > prob, 1, 0)
 end

    mat_sig[idx] = B
    mat_point[idx] = mean(A, 1)
    mat_point[diagind(mat_point)] = mean(C, 1)
    force_sym(mat_sig), force_sym(mat_point), A, B
 end


##########################
###### Run model ##########
##########################
using Distributions

p = 500
n = 20

my_mvn = MvNormal(eye(p))
X = rand(my_mvn, n)'

@time test = BB_est(X, 500,  .95)

antoine-levitt · May 14, 2018, 4:44pm

Hi,

You’re much more likely to get good answers if you:

Quote all your code using triple backticks
indent it properly
profile it first to get an idea of the bottlenecks so that it is clearer which steps are limiting

From what I can see by looking at the code, you have explicit inv, which are usually better avoided, and matlab-style code with masks and indices arrays, which are not very idiomatic and julia and might be faster with explicit loops or views/dots. It all depends on where your bottleneck is.

antoine-levitt · May 14, 2018, 4:57pm

So here, it appears by profiling (@profile and the ProfileView package) there are two main bottlenecks: matrix inversion and inference near temp1 = post_prob(A[:,i]). So first you need to figure out why post_prob is not type-stable (see the performance tips in the docs, and use the @code_warntype macro). Then, if you know d is full-rank, so that d'd is SPD, you can use a more efficient cholesky factorization to compute the inverse. If you can use the Cholesky factorization directly without computing the inverse explicitly, that’s even better.

edit: the fancy indexing also takes time: this kind of programming is really the only viable one in R/Matlab, but it’s not the case in Julia, rewriting it with loops will speed up things

Elrod · May 14, 2018, 4:59pm

On the explicit inv, one simple trick for positive definite matrices PD is to just use inv(cholfact(PD)) (there is also inv(Symmetric(PD)), but that doesn’t take advantage of positive definiteness, and was slower in a basic test). This is equivalent to chol2inv(chol(PD)) in R.

julia> PD = randn(3000,2000) |> x -> x' * x;

julia> using BenchmarkTools

julia> @benchmark inv($PD)
BenchmarkTools.Trial: 
  memory estimate:  64.48 MiB
  allocs estimate:  13
  --------------
  minimum time:     596.901 ms (1.87% GC)
  median time:      702.844 ms (1.32% GC)
  mean time:        704.770 ms (5.65% GC)
  maximum time:     837.746 ms (17.26% GC)
  --------------
  samples:          8
  evals/sample:     1

@benchmark inv(cholfact($PD))
BenchmarkTools.Trial: 
  memory estimate:  61.04 MiB
  allocs estimate:  11
  --------------
  minimum time:     365.522 ms (0.31% GC)
  median time:      458.306 ms (0.47% GC)
  mean time:        464.446 ms (5.83% GC)
  maximum time:     604.741 ms (22.65% GC)
  --------------
  samples:          11
  evals/sample:     1

Edit: if you want to go lower level in your LAPACK interface:

using Compat, Compat.LinearAlgebra
pdinv(PD) = pdinv!(copy(PD))
function pdinv!(PD)
    LAPACK.potrf!('U', PD)
    LAPACK.potri!('U', PD)
    Symmetric(PD)
end

and then:

julia> @benchmark pdinv($PD)
BenchmarkTools.Trial: 
  memory estimate:  30.52 MiB
  allocs estimate:  6
  --------------
  minimum time:     246.885 ms (0.13% GC)
  median time:      249.740 ms (0.25% GC)
  mean time:        255.489 ms (2.29% GC)
  maximum time:     353.000 ms (25.33% GC)
  --------------
  samples:          20
  evals/sample:     1

rdeits · May 14, 2018, 5:03pm

Just some helpful links to follow up on what @antoine-levitt suggested:

How to quote your code on Discourse: PSA: how to quote code with backticks
Performance tips from the manual: https://docs.julialang.org/en/stable/manual/performance-tips/#@code_warntype
BenchmarkTools, for accurately measuring the performance of your code: GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language
ProfileView, for visually identifying bottlenecks in code: GitHub - timholy/ProfileView.jl: Visualization of Julia profiling data

antoine-levitt · May 14, 2018, 5:27pm

It’s interesting that your pdinv is faster than inv(cholfact()), although they both use the same potri and potrf routines. Profiling it, it looks like copies do take some time, even on big matrices like this where the time to do a cholesky factorization should dominate. There’s also a ishermitian check that takes a non-trivial amount of time. It’s pretty amazing, and a good demonstration of both the FLOPs/memory bandwidth gap and the skill of BLAS programmers, that a full Cholesky factorization, which should be about 2k flops more than a copy, is only about 3x slower.

Donald_Williams · May 14, 2018, 5:47pm

I am in the process of exploring these options.

I was curious if using @parallel would also be a viable option

Donald_Williams · May 14, 2018, 5:47pm

I have updated the code. I have followed the performance tips, and have not seen much improvement, which I am sure is due to having zero Julia experience.

Topic		Replies	Views
Performance inv & solve in Julia & R Performance	15	1447	August 11, 2020
Explicit Inversion of a random matrix performs the same as Symmetric & SPD Performance question , linearalgebra	6	1074	October 25, 2018
Kron vs scalar product speed difference. python code faster? New to Julia question	41	4178	January 14, 2017
Linear solver \(A, B) performance vs Matlab A\b General Usage	32	7776	May 21, 2017
Is there any backsolve function in JULIA (for triangual matrices)? Performance question	4	1433	April 22, 2018

Improving Speed for a Novice

Related topics