Faster matrix product with a specifically structured matrix

rveltz · August 7, 2022, 1:34pm

I thonk there is a recent package for threaded multiplication of sparse matrices

rocco_sprmnt21 · August 7, 2022, 1:38pm

by isolating the part of the code that should exploit the particular structure of the matrix, we see that the improvement that can be obtained is not a lot

using BenchmarkTools
  v=collect(1.:10^6)
  x=similar(v)
  fill!(x,1.999)

julia> @btime begin
         c=0.
         for j in 1:10^6
             c += $v[j]*$x[j]
         end
       c
       end
  942.800 μs (0 allocations: 0 bytes)
9.995009995e11

julia> @btime begin
         c=0.
         for j in 1:10^6
           c += $v[j]
         end
         c *= $x[1]
       end
  866.600 μs (0 allocations: 0 bytes)
9.995009995e11

stevengj · August 7, 2022, 4:06pm

Not necessarily:

julia> @btime let c=0.
                @simd for j in 1:10^6
                    @inbounds c += $v[j]*$x[j]
                end
              c
              end
  639.450 μs (0 allocations: 0 bytes)
9.995009995e11

julia> @btime let c=0.
                @simd for j in 1:10^6
                    @inbounds c += $x[j]
                end
              c *= $x[1]
              end
  211.213 μs (0 allocations: 0 bytes)
3.9960009999980503e6

rocco_sprmnt21 · August 7, 2022, 5:01pm

below are the measures for library multiplication, the qmul (P, x) function (which is a copy of the library algorithm with @simd added even if it doesn’t seem to have any effect in this case) and the my_mul (Pt, x) that works on the matrix transports in order to use the rowvals and nzaranges functions by exploiting the structure of P (that is to have the rows with equal values).

PS
I don’t know how the @simd macro works so I don’t know why in the case of the qmul () function it doesn’t have the same effect as it does with my_mul ().
Maybe I didn’t use it properly.

julia> using SparseArrays, BenchmarkTools
       
       n,m = 1000,1000
       x = rand(m);
       P = sparse((rand(n,m).>0.97) .* rand(m));
       Pt=sparse(P');

julia> function qmul(Q,y)
         # rv=Q.rowval
         # nzv= Q.nzval
         rv=rowvals(Q)
         nzv= nonzeros(Q)
         c=Vector{Float64}(undef, size(Q,1))
         fill!(c,0.)
         @simd for col in 1:size(Q, 2)
           for j in nzrange(Q, col)
             @inbounds     c[rv[j]] += nzv[j]*y[col]
           end
         end
         c
       end
qmul (generic function with 1 method)

julia> function my_mul(Qt,y)
         rv=rowvals(Qt)
         nzv= nonzeros(Qt)
         c=Vector{Float64}(undef, size(Qt,2))
         fill!(c,0.)
         @simd for col in 1:size(Qt, 2)
             rj=nzrange(Qt,col)
             for j in rj
           @inbounds  c[col] += y[rv[j]]
             end
             c[col] *= nzv[rj.stop]
         end
         c
       end
my_mul (generic function with 1 method)

julia> @btime P*x;
  18.800 μs (1 allocation: 7.94 KiB)

julia> @btime my_mul(Pt,x);
  15.800 μs (1 allocation: 7.94 KiB)

julia> @btime qmul(P,x);
  18.900 μs (1 allocation: 7.94 KiB)

julia> P*x ≈ my_mul(Pt,x)
true

julia> P*x ≈ qmul(P,x)
true

rocco_sprmnt21 · August 7, 2022, 5:14pm

if i define x locally … there is always about 400us of difference

julia> @btime let
         c=0.
         x=similar($v);
         fill!(x,1.999);
        @simd for j in 1:10^6
          @inbounds   c += $v[j]*x[j]
         end
       c
       end
  1.949 ms (2 allocations: 7.63 MiB)
9.995009995e11

julia> @btime begin
         c=0.
         x=similar($v);
         fill!(x,1.999);
        @simd for j in 1:10^6
         @inbounds   c += $v[j]
         end
         c *= $x[1]
       end
  1.404 ms (2 allocations: 7.63 MiB)
9.995009995e11

stevengj · August 7, 2022, 7:48pm

Sure, but then you’re timing the allocation.

Juan · August 7, 2022, 8:47pm

Instead of using the whole matrix you can just save two vectors, one “A” with the non-zero positions and another “B” with the values.
And then instead of multiplying a matrix by your vector you can multiply elementwise “B” by the elements of your “x” given by the positions of “A”.

Topic		Replies	Views
Taking advantage of rowwise-constant sparse matrix Performance linearalgebra , sparse	7	522	August 30, 2021
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4539	December 20, 2017
Matrix-vector multiplication slower than a 'naive' for loop? Performance vector	7	1650	July 30, 2020
Slow sparse matrix-vector product with symmetric matrices General Usage	34	4376	July 28, 2019
Speeding up elementwise Vector-SparseMatrixCSC multiplication broadcasting Performance sparse	11	542	December 12, 2024

Faster matrix product with a specifically structured matrix

Related topics