Possible ways to accelerate LinearMapping computation

Mr.Robot · May 16, 2019, 5:16am

Problem

I am trying to solve a large scale (about 1 million webpages) PageRank problem with iterative solvers. The problem could be cast into solving the linear system

\left[\frac{1}{n}(1-p)\mathbf{11}^T+p\mathbf{A}^T\text{diag}(\mathbf{A1})^{-1}\right]\mathbf{x}=\mathbf{x}

where \mathbf{A} is the connectivity matrix that is specified as

\mathbf{A}_{ij}= \begin{cases} 1, i \text{ links to } j\\ 0, \text{otherwise} \end{cases}

The correctness of this formulation has been validated in some toy examples.

However, when trying to solve this with eigen solver provided in Arpack.jl, the process

Extremely slow when number of Krylov vectors is large
Could never converge when number of Krylov vectors is small or maxiter is small

The major bottleneck should arise from linear mapping function I wrote, especially the fact that I have to use a lot of global variables. However, I do not know how to resolve the issues in my code.

using LinearMaps, LinearAlgebra, SparseArrays, MatrixDepot, Arpack

# loading A
md = mdopen("SNAP/web-Google")
A = md.A

# global variables
p = 0.85
row_idx, col_idx, _ = findnz(A)
N = size(A, 1)
# compute out-degree of each page (row)
R = Dict(1:N .=> 0)
for i in row_idx
    R[i] += 1
end
prob = zeros(N)
# prob 
# when R[i] == 0, prob[i] = 1/N
# when R[i] != 0, prob[i] = 1/R[i]
for (key, value) in R
    if value == 0
        prob[key] = 1 / N
    else
        prob[key] = 1 / value
    end
end
# non_zero_idx_list
non_zero_idx_list = []
for i in 1:N
    push!(non_zero_idx_list, findall(!iszero, A[:, i]))
end

# iterative eigen-solver
function EigenMapping(x::AbstractVector)
    y = zeros(N)
    # first part
    @inbounds for i in 1:N
        non_zero_idx = non_zero_idx_list[i]
        y[i] = p * dot(x[non_zero_idx], prob[non_zero_idx])
    end   
    # second part
    y += (1 - p) / N * ones(N)
end

# solve for eigenvector
EigenLinearMap = LinearMap{Float64}(EigenMapping, N)
@time _, eig_sol = eigs(EigenLinearMap, nev=1, which=:LM, maxiter=300)

Could anyone help me, thank you in advance.

jlperla · May 16, 2019, 5:24am

Why do you have to? There is no way to set things up with a closure?

baggepinnen · May 16, 2019, 5:33am

How about this?

I can’t run your code since you didn’t provide a runnable example, but I believe this should work well

using LinearMaps, LinearAlgebra, SparseArrays, MatrixDepot, Arpack

# loading A
md = mdopen("SNAP/web-Google")
A = md.A

# function that closes over your previously global vaiables
function getmapping(A)
    p = 0.85
    row_idx, col_idx, _ = findnz(A)
    N = size(A, 1)
    # compute out-degree of each page (row)
    R = Dict(1:N .=> 0)
    for i in row_idx
        R[i] += 1
    end
    prob = zeros(N)
    # prob
    # when R[i] == 0, prob[i] = 1/N
    # when R[i] != 0, prob[i] = 1/R[i]
    for (key, value) in R
        if value == 0
            prob[key] = 1 / N
        else
            prob[key] = 1 / value
        end
    end
    # non_zero_idx_list
    non_zero_idx_list = []
    for i in 1:N
        push!(non_zero_idx_list, findall(!iszero, A[:, i]))
    end
    
    # iterative eigen-solver, this creates an anonymous function that closes over all variables above 
    function (x::AbstractVector)
        y = zeros(N)
        # first part
        @inbounds for i in 1:N
            non_zero_idx = non_zero_idx_list[i]
            y[i] = p * dot(x[non_zero_idx], prob[non_zero_idx])
        end
        # second part
        y += (1 - p) / N * ones(N)
    end
end

EigenMapping = getmapping(A)
# solve for eigenvector
EigenLinearMap = LinearMap{Float64}(EigenMapping, N)
@time _, eig_sol = eigs(EigenLinearMap, nev=1, which=:LM, maxiter=300)

Mr.Robot · May 16, 2019, 6:18am

Since at the end of day, EigenMapping is passed to LinearMap constructor, which does not accept additional parameters that are required in the computation of EigenMapping.

Mr.Robot · May 16, 2019, 6:21am

Thank you, by benchmarking the code, this does help reduce the run time by a few seconds from 60+ seconds.

baggepinnen · May 16, 2019, 7:57am

Try the following to reduce the number of allocations

function getmapping(A)
    p = 0.85
    row_idx, col_idx, _ = findnz(A)
    N = size(A, 1)
    # compute out-degree of each page (row)
    R = Dict(1:N .=> 0)
    for i in row_idx
        R[i] += 1
    end
    prob = zeros(N)
    # prob
    # when R[i] == 0, prob[i] = 1/N
    # when R[i] != 0, prob[i] = 1/R[i]
    for (key, value) in R
        if value == 0
            prob[key] = 1 / N
        else
            prob[key] = 1 / value
        end
    end
    # non_zero_idx_list
    non_zero_idx_list = []
    for i in 1:N
        push!(non_zero_idx_list, findall(!iszero, A[:, i]))
    end

    y = zeros(N)
    # iterative eigen-solver
    function EigenMapping(x::AbstractVector)
        # first part
        y .= 0
        @inbounds for i in 1:N
            non_zero_idx = non_zero_idx_list[i]
            y[i] = p * dot(x[non_zero_idx], prob[non_zero_idx])
        end
        # second part
        @. y += (1 - p) / N
    end
end