Help on optimization

Hello everyone,

I need some help on optimization of a function, which looks like this

# re-edit
SpMatrixListUnion = Union{Vector{SparseMatrixCSC{Float64,Int64}},Vector{SparseMatrixCSC{Complex128,Int64}}}

function avg_V_on_ψ!(
    res::Array{Float64,2},                                # result to write
    vec_list::Vector{Matrix{Float64}},              # a list of V's, 
    OP_list::SpMatrixListUnion                       # a list of O's
    )                                    # the computation is V' * O * V, 
                                         #result is written into Array "res" in a consistent manner
    vl = length(vec_list)
    opl = length(OP_list)
    for i=1:vl
        I,J,V = findnz(OP_list[i])   # helpful when OP_list[i] is a very sparse matrix
        res[:,i] = real.(sum((vec_list[i][I,:]).*broadcast(*,V,vec_list[i][J,:]),1)) # V' * O * V for sparse matrix O
    end
    return nothing
end

For each matrix O=OP_list[i] in OP_list and each vector set V=vec_list[i] in vec_list, compute the average V'*O*V and write result to res. I am working with sparse matrices for O in OP_list, so I have done a first step optimization by using findnz. For really sparse matrices, the result is good but not satisfactory.

It seems to me that there cannot be further reduction on the number of arithmatic operations. So I hope to ask some experienced users for help, to make such function even faster, because it is the only bottleneck in my program …

Thanks in advance in any suggestions!

BTW you can verify the code by running the following test:

# avg_V_on_ψ0 is a slow but guaranteed version for the task

avg_V_on_ψ0(vecs,OP) = [real(dot(vecs[:,k],OP*vecs[:,k])) for k=1:size(vecs)[2]]

# the test

vlist = [rand(400,400) for k=1:300];
oplist = [1000.0sprand(400,400,0.0001) for k=1:300];
res2 = zeros(Float64,400,300);
gc()
@time res1 = [avg_V_on_ψ0(vlist[i],oplist[i]) for i=1:300];
@time avg_V_on_ψ!(res2,vlist,oplist)

# test :  norm should be equal ~
norm(hcat(res1...)-res2)
1 Like

Would you mind quoting your code to make it more readable?

2 Likes

Your code doesn’t run here. I’m assuming that SpMatrixUnion should be SpMatrixListUnion, but then I still get

julia> avg_V_on_ψ!(res2,vlist,oplist)
ERROR: MethodError: no method matching avg_V_on_ψ!(::Array{Float64,2}, ::Array{Array{Float64,2},1}, ::Array{SparseMatrixCSC{Float64,Int64},1})
Closest candidates are:
  avg_V_on_ψ!(::Array{Float64,2}, ::Array{Array{Float64,2},1}, ::Union{SparseMatrixCSC{Complex{Float64},Int64}, SparseMatrixCSC{Float64,Int64}}) at REPL[4]:6

on 0.6.2.

Using views give a meagre 20% improvement:

function avg_V_on_ψ!2(
    res::Array{Float64,2},
    vec_list::Vector{Matrix{Float64}},
    OP_list
    )
    for i=1:length(vec_list)
        I,J,V = findnz(OP_list[i])
        v = vec_list[i]
        vI = view(v, I, :)
        vJ = view(v, J, :)
        res[:,i] = real.(sum(vI .* broadcast(*, V, vJ), 1))
    end
    return nothing
end
2 Likes

Sorry I’ve got a paranoid way of managing types. I have replaced the SpMatrixUnion by SpMatrixListUnion. This time it should work …

Excellent ! Thanks a lot.
It seems that the broadcast function prefers views of matrices, instead of matrices themselves.

I think the performance gain is just by avoiding having to copy the sub-matrices.

BTW kristofer.carlsson just removed the type after OP_list, that makes the code more transparent …

I just realized that real.(sum(vI .* broadcast(*, V, vJ), 1)) is equivalent to res[:,i] = real.(RowVector(V)*(vI.*vJ)). The latter leads to 20% gain

1 Like

Or V' * (vI .* vJ) (a bit more compact)

yes, only if the O in op_list has all real entries

optimized code

function avg_V_on_ψ!(
    res::Array{Float64,2},
    vec_list::Vector{Matrix{Float64}},
    OP_list::SpMatrixListUnion
    )
    """ for each operator in OP_list and each eigenvector set in vec_list, compute the average and write result to res."""
    for i=1:length(vec_list)
        I,J,V = findnz(OP_list[i])
        v = vec_list[i]
        vI = view(v, I, :)
        vJ = view(v, J, :)
        res[:,i] = real.(RowVector(V)*(vI.*vJ))
    end
    return nothing
end

before optimzation
0.480163 seconds (59.48 k allocations: 607.347 MiB, 10.69% gc time)
after
0.331370 seconds (50.49 k allocations: 177.589 MiB, 4.70% gc time)