# Help on optimization

#1

Hello everyone,

I need some help on optimization of a function, which looks like this

``````# re-edit
SpMatrixListUnion = Union{Vector{SparseMatrixCSC{Float64,Int64}},Vector{SparseMatrixCSC{Complex128,Int64}}}

function avg_V_on_ψ!(
res::Array{Float64,2},                                # result to write
vec_list::Vector{Matrix{Float64}},              # a list of V's,
OP_list::SpMatrixListUnion                       # a list of O's
)                                    # the computation is V' * O * V,
#result is written into Array "res" in a consistent manner
vl = length(vec_list)
opl = length(OP_list)
for i=1:vl
I,J,V = findnz(OP_list[i])   # helpful when OP_list[i] is a very sparse matrix
res[:,i] = real.(sum((vec_list[i][I,:]).*broadcast(*,V,vec_list[i][J,:]),1)) # V' * O * V for sparse matrix O
end
return nothing
end
``````

For each matrix `O=OP_list[i]` in OP_list and each vector set `V=vec_list[i] in vec_list`, compute the average `V'*O*V` and write result to res. I am working with sparse matrices for O in OP_list, so I have done a first step optimization by using findnz. For really sparse matrices, the result is good but not satisfactory.

It seems to me that there cannot be further reduction on the number of arithmatic operations. So I hope to ask some experienced users for help, to make such function even faster, because it is the only bottleneck in my program …

Thanks in advance in any suggestions!

BTW you can verify the code by running the following test:

``````# avg_V_on_ψ0 is a slow but guaranteed version for the task

avg_V_on_ψ0(vecs,OP) = [real(dot(vecs[:,k],OP*vecs[:,k])) for k=1:size(vecs)[2]]

# the test

vlist = [rand(400,400) for k=1:300];
oplist = [1000.0sprand(400,400,0.0001) for k=1:300];
res2 = zeros(Float64,400,300);
gc()
@time res1 = [avg_V_on_ψ0(vlist[i],oplist[i]) for i=1:300];
@time avg_V_on_ψ!(res2,vlist,oplist)

# test :  norm should be equal ~
norm(hcat(res1...)-res2)
``````

#2

#3

Your code doesn’t run here. I’m assuming that `SpMatrixUnion` should be `SpMatrixListUnion`, but then I still get

``````julia> avg_V_on_ψ!(res2,vlist,oplist)
ERROR: MethodError: no method matching avg_V_on_ψ!(::Array{Float64,2}, ::Array{Array{Float64,2},1}, ::Array{SparseMatrixCSC{Float64,Int64},1})
Closest candidates are:
avg_V_on_ψ!(::Array{Float64,2}, ::Array{Array{Float64,2},1}, ::Union{SparseMatrixCSC{Complex{Float64},Int64}, SparseMatrixCSC{Float64,Int64}}) at REPL[4]:6
``````

on 0.6.2.

#4

Using views give a meagre 20% improvement:

``````function avg_V_on_ψ!2(
res::Array{Float64,2},
vec_list::Vector{Matrix{Float64}},
OP_list
)
for i=1:length(vec_list)
I,J,V = findnz(OP_list[i])
v = vec_list[i]
vI = view(v, I, :)
vJ = view(v, J, :)
res[:,i] = real.(sum(vI .* broadcast(*, V, vJ), 1))
end
return nothing
end
``````

#5

Sorry I’ve got a paranoid way of managing types. I have replaced the `SpMatrixUnion` by `SpMatrixListUnion`. This time it should work …

#6

Excellent ! Thanks a lot.
It seems that the `broadcast` function prefers views of matrices, instead of matrices themselves.

#7

I think the performance gain is just by avoiding having to copy the sub-matrices.

#8

BTW kristofer.carlsson just removed the type after `OP_list`, that makes the code more transparent …

#9

I just realized that `real.(sum(vI .* broadcast(*, V, vJ), 1))` is equivalent to `res[:,i] = real.(RowVector(V)*(vI.*vJ))`. The latter leads to 20% gain

#10

Or `V' * (vI .* vJ)` (a bit more compact)

#11

yes, only if the O in op_list has all real entries

#12

optimized code

``````function avg_V_on_ψ!(
res::Array{Float64,2},
vec_list::Vector{Matrix{Float64}},
OP_list::SpMatrixListUnion
)
""" for each operator in OP_list and each eigenvector set in vec_list, compute the average and write result to res."""
for i=1:length(vec_list)
I,J,V = findnz(OP_list[i])
v = vec_list[i]
vI = view(v, I, :)
vJ = view(v, J, :)
res[:,i] = real.(RowVector(V)*(vI.*vJ))
end
return nothing
end
``````

before optimzation
0.480163 seconds (59.48 k allocations: 607.347 MiB, 10.69% gc time)
after
0.331370 seconds (50.49 k allocations: 177.589 MiB, 4.70% gc time)