Faster matrix product with a specifically structured matrix

lrnv · August 30, 2021, 1:55pm

Hi,

I have a sparse matrix P that turns out to have only equal values (or zeros) on each line:

julia> all([sum(unique(P[i,:]).!=0)<=1 for i in 1:size(P,1)])
true

julia>

Where I used <=1 since some lines are completely zeros.

With this matrix, I’m repetitively doing matrix-vector products P*x on different vectors. But since every line has only one value, I guess that the matrix-vector product could be done more efficiently than what SparseArrays does.

Any Idea if there is a ready-to-use matrix class like this one, or if I could use a simple one-hot-encoding structure to solve this multiplication issue more efficiently ?

Sukera · August 30, 2021, 1:59pm

Did you notice any particular slowdowns? Do you have an example for your matrix-vector product?

(Though I don’t think this is necessarily needed here.)

lrnv · August 30, 2021, 2:02pm

I do have an example but its a little hidden behind other things.

You could construct easily one of these matrices :

P = (rand(1000,1000).>0.97) .* rand(1000)

Where the 0.97 and size 1000 is what I have empirically on a true example.

Sukera · August 30, 2021, 2:08pm

That’s a dense matrix though. All zero elements are stored, you can convert that to a true sparse matrix like this:

julia> using SparseArrays                                             
                                                                      
julia> P = sparse((rand(1000,1000).>0.97) .* rand(1000));             
                                                                      
julia> summary(P)                                                     
"1000×1000 SparseMatrixCSC{Float64, Int64} with 29741 stored entries"

Multiplication of which should be pretty fast.

That construction doesn’t necessarily obey the one-hot criterion by the way. It’s just unlikely to not be one-hot. Do you have other guarantees that make sure this still holds?

lrnv · August 30, 2021, 2:11pm

Yes this is what i’m doing for the moment. But since each row has only one possible non-zero value, I was hoping for a further improvement.

I’m pretty sure that the way I’m constructing the matrix garanties that my criterion (in my first post) will always hold.

Sukera · August 30, 2021, 2:17pm

Do you mean in your real example or in the way you’ve shown here? What you’ve done here is basically just sprand, except for a dense matrix (and allocating a whole bunch of intermediaries):

julia> P = sprand(1000,1000,0.03)                                  
1000×1000 SparseMatrixCSC{Float64, Int64} with 29667 stored entries

~~and for that, the property absolutely does not hold:~~ scratch that, wrong comparison

lrnv · August 30, 2021, 2:19pm

Well, In my real examples it holds, but also :


julia> P = sparse((rand(1000,1000).>0.97) .* rand(1000));

julia> all([sum(unique(P[i,:]).!=0)<=1 for i in 1:size(P,1)])
true

julia>

However, as you noted,

julia> P = sprand(1000,1000,0.03);

julia> all([sum(unique(P[i,:]).!=0)<=1 for i in 1:size(P,1)])
false

julia>

This is not the same thing.

lrnv · August 30, 2021, 2:24pm

Furthermore, the rate of non-zero values is also around 3% on each line.

Which means that SparseArrays is doing, for every line i of the matrix-vector product,

\sum_{j \in \text{Non-zero}} P[i,j]*x[j]

while we could do :

P[i,j]*\sum_{j \in \text{Non-zero}} x[j]

Which reduces the number of multiplications from ~ (3% of the number of columns) to 1 per line, inducing a 30x theoretical speedup for 1000*1000 matrices (neglecting the addition cost). I was asking about a neat way of implementing this.

Sukera · August 30, 2021, 2:33pm

You’re right - my mistake. It should have been:

julia> P = sprand(Bool, 1000, 1000, 0.03) .* rand(1000)

This keeps sparsity and your property that each column only has one unique value though it may occur more than once:

julia> findnz(P[1,:])[2] |> length
36                                
                                  
julia> findnz(P[1,:])[2][1]       
0.8866977610918272                
                                  
julia> findnz(P[1,:])[2][1:4]     
4-element Vector{Float64}:        
 0.8866977610918272               
 0.8866977610918272               
 0.8866977610918272               
 0.8866977610918272

How would you move the iterating index outside of the loop…?

I think this is a red herring. You still have to store the indices of each column where there is a nonzero, even if you only store “one” nonzero per column. A general matrix product still has to look at all of those.

lrnv · August 30, 2021, 2:35pm

As you noted every values of P[i,j] is the same for a fixed line.

Here is a dummy implementation:

n,m = 1000,1000
x = rand(m)
P = sparse((rand(n,m).>0.97) .* rand(m));

# check predicate:
all([sum(unique(P[i,:]).!=0)<=1 for i in 1:size(P,1)])

# extract the b vector: 
b = [sum(unique(P[i,:])) for i in 1:size(P,1)]

# compute the matrix-vector product: 
rez = similar(b)
for i in 1:n
    rez[i] = b[i]*sum(x[P[i,:].nzind])
end

rez ≈ P*x

Sukera · August 30, 2021, 2:38pm

It’s not though:

julia> findnz(P[1,:])[1] == findnz(P[2,:])[1] # column 1 has different nonzeros to column 2
false                                                   
                                                        
julia> findnz(P[:,1])[1] == findnz(P[:,2])[1] # row 1 has different nonzeros to row 2
false

~~I think you’re confusing columns and rows. Julia is column major, i.e. the first index iterates over columns.~~

In any case, the sparsity per row and per column is different. There’s no shortcut here (well, or I’m missing something major here, but I don’t think so).

lrnv · August 30, 2021, 2:42pm

Hum…; This is not concordant with what my machine says:

julia> P = [1 2
       3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> for i in 1:2
       display(P[i,:])
       end
2-element Vector{Int64}:
 1
 2
2-element Vector{Int64}:
 3
 4
julia>

Sukera · August 30, 2021, 2:43pm

Right, mixup on my side - my apologies!

Still, the sparsity pattern per row is not the always the same. You have to check each row seperately. Same goes for columns.

lrnv · August 30, 2021, 2:45pm

Yes, I have to check the sparsity pattern of each row separately, which is what i’ve done in my dummy implmementation. But still, the number of multiplications is way reduced, and hence I can expect better performance. For the moment, my implementation is allocating like crazy so I do not have a better perf than SparseArrays, but this should be possible

Sukera · August 30, 2021, 2:52pm

You’re allocating like crazy because x[P[i,:].nzind] creates a copy (as do a bunch of other operations you’re doing, like indexing with a bitmatrix…).

This code seems very confusing, am I reading this right that you’re using the sparsity pattern of one matrix to index into another…? It reads very Matlab-y, where forcing everything into matrix/vector operations is the way to go, but this pattern will not lead to good performance in julia, I’m afraid.

May I ask, what problem/code are you trying to accelerate exactly? I don’t mean the matmul, I mean what you’re doing with the matmul. Loops in julia are fast, no need to shy away from them.

lrnv · August 30, 2021, 2:56pm

I’m sorry I do not know about Matlab.

With the matmul i’m doing a Dykstra projection in an optimization routine. This matmul is evaluated billions of time, and represent 50% of my runtime (hence my interest for it ^^ ).

Yes, for the moment it is allocating, but dont you think a non-allocative version could be produced, and will provide a better perf than SparseArrays ? This is why I was asking about hot-encoding at first (using a sparsity pattern of a matrix is like using a bitmatrix).

Ho maybe the bitmatrix is what I should use ?

Sukera · August 30, 2021, 2:58pm

I don’t think so. This should have the same number of multiplications as a regular sparse matrix multiplication, since you’re taking the indices per row from the sparse matrix anyway.

No, I don’t think so, which is why I’m confused about you trying to up the ante on known industry practices.

Are you certain that that runtime is due to slow matmul of real sparse vectors and not due to e.g. a lot of allocations from indexing with vectors etc? Can you share the code that’s running in a hot loop that’s too slow?

lrnv · August 30, 2021, 3:12pm

I do not think you understood the predicate I use and the structure of the matrix: For each line of the matrix, every non-zero-value is the same value. Hence the matrix-vector product fatorizes. The number of multiplications is reduced, but, you are right, the indexing is still the same.

Here is a (quickly-extracted so not that clean) MWE:

using SparseArrays
struct SparseLinearConstraint{T}
    A::Matrix{T}
    b::Vector{T}
    P::SparseMatrixCSC{T, Int64}
    q::SparseVector{T, Int64}
    rez::SparseVector{T, Int64}
end
function SparseLinearConstraint(A::Matrix{T},b::Vector{T}) where T
    AiAA = A'inv(A*A')
    P = sparse(-AiAA*A)
    q = sparse(+AiAA*b)
    return SparseLinearConstraint(A,b,P,q,spzeros(eltype(q),length(q)))
end
@inline function proj!(x,X::SparseLinearConstraint)
    X.rez .= x + X.P*x + X.q  ##### This lines takes 70% of runtime, due to the sparse matrix-vector product.
end
struct SparsePositivityConstraint{T} 
    rez::SparseVector{T, Int64}
    tol::T
end
function SparsePositivityConstraint(x)
    return SparsePositivityConstraint(spzeros(eltype(x),length(x)),sqrt(eps(eltype(x))))
end
@inline function proj!(x,X::SparsePositivityConstraint)
    X.rez .= x
    SparseArrays.fkeep!(X.rez, (i,x) -> x>X.tol)
end

function Dykstra!(x,ctrs)
    old_x = zero(x)
    err = sqrt(eps(eltype(x)))
    sn = [zero(x) for _ in ctrs]
    while !all(abs.(old_x .-x) .< err)
        old_x .= x
        for (i,ctr) in enumerate(ctrs)
            proj!(x-sn[i],ctr)
            sn[i] += ctr.rez - x
            x .= ctr.rez
        end
    end
    return nothing
end

# A simple test case:
using Random
function linearize(a::Vector{T},b::Vector{T}) where T
    n1 = length(a)
    n2 = length(b)
    p = (n1+1)*(n2+1)
    eq_A = zeros(T,(n1+n2,p-1))
    eq_b = [a...,b...]
    for i in 0:n2
        for j in 1:n1
            eq_A[j,j+i*(1+n1)] = 1
        end
    end
    for i in 1:n2
        eq_A[n1+i,(1:(n1+1)).+n1.+(i-1)*(n1+1)] .= 1
    end
    return eq_A,eq_b
end
Solve_Type = Float64
Random.seed!(123)
n,m = 30,30;
a1 = rand(n);
a2 = rand(m);
x = Solve_Type.(randn((n+1)*(m+1)-1));
spx = deepcopy(x);
spx = sparse(spx);
A,b= linearize(a1,a2);

constraints = [SparseLinearConstraint(A[1:n,:],a1),
               SparseLinearConstraint(A[n+1:end,:],a2),
               SparsePositivityConstraint(spx)];

# Some tests: 
spx = deepcopy(x);
@time Dykstra!(spx,constraints)
spx = deepcopy(x);
@time Dykstra!(spx,constraints)
spx = deepcopy(x);
spx = sparse(spx)
@profview Dykstra!(spx,constraints)

The main algorithm is in the Dykstra! function, with a majority of the runtime in the proj!(x,X::SparseLinearConstraint) function.
In this function, profview reports SparseArrays.mul! to be the most time-consuming function.

EDIT: The upper code has been edited once or twice. I have now a working version of what I want, which writes as :


using SparseArrays
n,m = 1000,1000
x = rand(m)
P = sparse((rand(n,m).>0.97) .* rand(m));

# check predicate:
all([sum(unique(P[i,:]).!=0)<=1 for i in 1:size(P,1)])

# extract the b vector and idx vector:
b = [sum(unique(P[i,:])) for i in 1:size(P,1)]
M = sparse(BitMatrix(P .!= 0))
idx = [P[i,:].nzind for i in 1:n]
rez = similar(b)
@inline my_prod!(b,M,x) = b.* (M*x)

rez = my_prod!(b,M,x)
rez ≈ P*x

# test : 
using BenchmarkTools
@btime P*$(x);
@btime my_prod!(b,M,$x);

Which now has the same runtime as the full sparse product, but still allocates a little too much. I’m sure we can do better.

rocco_sprmnt21 · August 7, 2022, 10:39am

In this way you do not count the time necessary for the construction of the matrix M and of the vector b.
I suppose this only has to be done once while multiplication by x has to be done a very large number of times.
In this case I would like to submit the following proposal to you:

function my_mul(Qt,y)
rv=rowvals(Qt)
nzv= nonzeros(Qt)
c=zeros(Float64, size(Qt,2))
# c=Vector{Float64}(undef, size(Qt,2))
# fill!(c,0.)
@inbounds for col in 1:size(Qt, 2)
    rj=nzrange(Qt,col)
     for j in rj
      c[col] += y[rv[j]]
     end
     c[col] *= nzv[rj.stop]
end
c
end


Pt=sparse(P')

julia> n,m = 1000,1000
(1000, 1000)

julia> x = rand(m);

julia> P = sparse((rand(n,m).>0.97) .* rand(m));

julia> Pt=sparse(P');

julia> P*x ≈ my_mul(Pt,x)
true

julia> @btime P*x;
  25.900 μs (1 allocation: 7.94 KiB)

julia> 

julia> @btime my_mul(Pt,x);
  21.400 μs (1 allocation: 7.94 KiB)

The construction of the Pt matrix is expensive and the improvement obtained is not very big, but if the multiplication has to be done many times, it may be worth it.

stevengj · August 7, 2022, 1:07pm

Realize that the SparseArrays matrix-vector product is implemented in pure-Julia here. So, you could pretty easily make your own implementation.

To fully take advantage of your structure, the ideal thing would be to define your own subtype of AbstractSparseMatrix. In particular, you’d probably want to store it in compressed-sparse row format (instead of CSC as in SparseArrays.jl) and then only store one coefficient per row. You don’t have to implement all of the SparseArrays interface if you are only using this in your own code for certain operations.

Topic		Replies	Views
Taking advantage of rowwise-constant sparse matrix Performance linearalgebra , sparse	7	523	August 30, 2021
Specialized matrix-matrix multiplication algorithm New to Julia question , performance , linearalgebra	5	420	July 9, 2024
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4545	December 20, 2017
Help with this code (deallocation/pointers on sparse matrices) Performance array , sparse , matrices , sparsearrays	31	695	February 3, 2024
Scaling a sparse matrix row-wise and column-wise too slow Performance broadcast , sparse	20	451	June 23, 2024

Faster matrix product with a specifically structured matrix

Related topics