Clustering using matrix decomp code performance tips

spinkney · September 27, 2021, 9:58pm

Hi Julians,

I’ve ported this matlab code from Clustering by low-rank doubly stochastic matrix decomposition to Julia. But it’s much too slow for my application.

I’m still green to Julia so posting here to see if there’s any performance tips (or glaring aesthetic or other improvements) to make it faster. Especially on the sparse multiplication front.

I failed to get LoopVectorization to work or threading to speed it up.

test with

A = sprand(1000, 1000, 0.25)
# 20 clusters, dirichlet ~ 1, 1000 iters
Wout, ce = dcd(A, 20, nothing, 1, 1000)

Looking to do this on matrices in the 100k to millions.

using LinearAlgebra, Random, SparseArrays

function dcd(A::SparseMatrixCSC{Tv, Ti}, 
            r::Int, 
            W0 = nothing, 
            alpha::Real = 1.,
            max_iter::Int = 10000) where {Tv, Ti}
    n = size(A, 1);

    if W0 === nothing 
        W0 = rand(n, r);
    end

    W0 ./= sum(W0, dims = 2) .+ eps()
    W = W0;

    I_index, J_index, Anz = findnz(A);

    ZW = zeros(Tv, n, r)
    for iter ∈ 1:max_iter
       inv_s = 1 ./ (sum(W, dims = 1) .+ eps())
       inv_W = 1 ./ (W .+ eps());

       # created ZW up top to make this in-place, does that make much of a difference?
       # ZW = sp_factor_ratio(I_index, J_index, A, W .* inv_s, W') * W

      # sp_factor_ratio is defined below it does
      # Z = A ./ ( (W .* inv_s) * W' ) but faster
       mul!(ZW, sp_factor_ratio(I_index, J_index, A, W .* inv_s, W'), W)
       gradn = 2 * ZW .* inv_s .+ alpha * inv_W
       gradp = ( diag(W' * ZW)' .* inv_s .^ 2 ) .+ inv_W
  
       a = sum(W ./ (gradp .+ eps()), dims = 2);
       b = sum(W .* gradn ./ (gradp .+ eps()), dims = 2);

      # is it better to break these up like below or keep it as one calculation?
      # W = W .* (gradn .* a .+ 1) ./ ( (gradp .* a) .+ b .+ eps())
      # @. W *= (gradn * a + 1) / ( (gradp * a) + b + eps())
       W .*= gradn .* a .+ 1
       W ./= ( (gradp .* a) .+ b .+ eps())
   end

    Ahat = sum( sp_factor(I_index, J_index, W, W') ./ (sum(W, dims = 1) .+ eps()), dims = 2)
    cross_entropy = -Anz' * log.(vec(Ahat) .+ eps());
    return W , cross_entropy
end

# Helps calculate the cross-entropy in a sparse manner
function sp_factor(I::AbstractVector, J::AbstractVector, 
                    W::AbstractArray{T,N}, H::AbstractArray{T,N}) where {T, N}
    nz = size(I, 1)
    m = size(W, 1)
    r = size(W, 2)
    WH = zeros(nz, r)

   for t = 1:nz
        i = I[t]
        j = J[t]
        for k = 1:r
            WH[t, k] = W[i, k] * H[k, j]
        end
    end

    return WH;
end

# calculates
# Z = X ./ ( (W .* inv_s) * H )
function sp_factor_ratio(I::AbstractVector, J::AbstractVector,
    X::SparseMatrixCSC{Tv, Ti}, W::Matrix{T},
    H::AbstractArray{T, N}) where {Tv, Ti, T, N}
   
    n = size(X, 2)
    r = size(W, 2)
    irs = X.rowval
    jcs = X.colptr
    Z = sparse(I, J, zeros(size(I, 1)))

    ind = 1

    for row = 1:n # row
        col_start = jcs[row]
        col_stop = jcs[row + 1]
        if (col_start != col_stop)
         for current_col_index = col_start:col_stop - 1 # col
                col = irs[ind];
                xhat = 0;
                for k = 1:r
                    xhat += W[row, k] * H[k, col];
                end
                xhat = max(xhat, eps());
                Z.nzval[ind] = X.nzval[ind] / xhat;
                ind += 1
            end
        end
    end
    return Z
end

goerch · October 2, 2021, 2:56pm

Most of the time is spent in dcd’s

for iter ∈ 1:max_iter

loop. I wonder if there is some kind of convergence check missing?

Oscar_Smith · October 2, 2021, 3:20pm

When you say too slow for your application, do you mean that it’s slower than Matlab, or slower than you need it to be? If the former, a benchmark against Matlab would be very useful.

spinkney · October 2, 2021, 3:30pm

I don’t have Matlab so I’m not sure about the speed vs Matlab.

I’ve begun to look at approximating this decomposition with SGD instead of the full data on every iteration.

A convergence check is interesting, do you have any ideas how to check that? I’ll think about this too.

goerch · October 2, 2021, 4:28pm

The referenced paper simply states ‘repeat … until W is unchanged’

lmiq · October 2, 2021, 7:05pm

If you profile it you can see that most of the time is spent on the sp_factor_ratio and mul! functions. You can’t probably do much about mul!, so probably you want to invest in improving the first one:

That was built with the VSCode extension: Julia VS Code extension version v0.17 released

Topic		Replies	Views
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4545	December 20, 2017
Scaling a sparse matrix row-wise and column-wise too slow Performance broadcast , sparse	20	451	June 23, 2024
Making efficient some small subroutines Performance	22	1133	May 8, 2020
How to efficiently construct a large SparseArray? Packages for this? Performance package , performance , parallel , sparse	20	1761	May 15, 2022
Speed comparison: Sparse matrix multiplication vs usual matrix multiplication General Usage question	7	3900	February 13, 2017

Clustering using matrix decomp code performance tips

Related topics