CSC kills the prospect of multithreading. Shouldn't Julia use CSR?

I find it strange that a community who develops a language with a reputation based on performance finds no “significant advantage” of having built-in multithreaded SpMV. Why should dense BLAS be multithreaded and not sparse BLAS? I doubt everyone thinks like you, especially the users whose work is focused on scientific computing.

No, you seem to be missing @mkitti’s point. What is the performance advantage of a built-in library vs a package?

While I think it would be nice for SparseMatricesCSR.jl to be upstreamed into SparseArrays.jl, being in a separate package doesn’t magically change its performance.

3 Likes

Yes, I do agree that making it built-in does not mean performance gain. But it would be nice to have a seamless access to multithreaded SpMV, just like we do with dense BLAS functions.

What does built-in mean to you?

To me it means slow development of a package that is difficult to change and update.

Note that SparseArrays is actually not a built-in anymore anyways, so it’s kind of funny that this is the library being discussed. Here is its repo:

It was made separate because it pulls in SuiteSparse, so people wanted the option to remove it from the standard build in order to have a GPL-free distribution. SparseArrays couldn’t be fully removed without breaking, so now it’s a standard library that lives in a separate Github repo in a separate organization but is built into the standard image, set to be fully removed (and then just become a normal Julia library) if a breaking change ever happens.

And this is the library that we’re discussing about whether a feature needs to be moved into in order to be “built-in” vs “a package”. I think that’s really splitting hairs. As long as there’s a good easily available library for SparseMatrixCSR I think it’s fine. I think organizationally and documentation wise, having it in SparseArrays would make more sense, and the discoverability of it in GridAp’s org instead of JuliaSparse is a bit odd and IMO should be changed, but there’s nothing about the package itself that is an issue from what I can tell.

3 Likes

If CSR support and multithreaded SpMV were part of SparseArrays.jl it would be easier to not have to use several packages, and it would improve the visibility of those functionalities as SparseArrays.jl is way more popular and used than ThreadedSparseArrays.jl and SparseMatricesCSR.jl. I’m not really conscious of the additional burden it would be, but I would surely appreciate it.

Oh… I didn’t know this.

The minimum change to fix this is probably just to add a reference to SparseMatrixCSR.jl in the SparseArrays documentation, until people get around to the reorganization (though I’m sure that just by putting that reference into the docs it’ll spark the conversation of reorganization and get it done).

5 Likes

If you have a matrix stored in CSC, I agree that you cannot re-order it for free.

What I said was that you can construct the transpose matrix as CSC initially, and then re-interpret it as CSR via the transpose wrapper. In this way you can use the existing SparseArrays library to construct CSR matrices with no overhead.A thin wrapper could make this API nicer.

Okay. Say all you have is a CSR representation of your matrix. That is, you are equipped with m::Int, n::Int, rowptr::Vector{Ti}, colval::Vector{Ti} and nzval::Vector{Tv}. If you want to make use of Julia’s built-in CSC functionalities to carry the equivalent of a CSR SpMV with CSC SpMV_T, you can indeed proceed as follows: B=SparseMatrixCSC(m, n, rowptr, colval, nzval); transpose(B)*x. If that’s what you meant, I agree. Then it wouldn’t be a bad idea to parallelize SpMV_T in SparseArrays.jl.

The point is that any sparse matrix is constructed at some point so if you want a CSR rather than a CSC you can always construct the transpose of the matrix you want.

1 Like

The point is that the type Transpose{SparseMatrixCSC} has exactly the same memory layout as CSR, i.e., knowing that you can directly construct a matrix accordingly. Here is a simple example:

julia> I = [1,2,2,3]; J = [4,1,3,1]; V = rand(4);

julia> A = sparse(I, J, V)  # construct CSC from COO
3×4 SparseMatrixCSC{Float64, Int64} with 4 stored entries:
  ⋅         ⋅    ⋅        0.0265797
 0.502663   ⋅   0.836581   ⋅ 
 0.588429   ⋅    ⋅         ⋅ 

julia> B = transpose(sparse(J, I, V))  # construct CSR from COO
3×4 LinearAlgebra.Transpose{Float64, SparseMatrixCSC{Float64, Int64}} with 4 stored entries:
  ⋅         ⋅    ⋅        0.0265797
 0.502663   ⋅   0.836581   ⋅ 
 0.588429   ⋅    ⋅         ⋅ 

# Now, get the details of the CSR representation
julia> m, n = size(A);

julia> rowptr = B.parent.colptr;  # Rows are stored as columns in transpose

julia> colval = B.parent.rowval; # Same for cols

julia> nzval = B.parent.nzval

# Reconstruct directly from CSR information
julia> C = transpose(SparseMatrixCSC(n, m, rowptr, colval, nzval))
3×4 LinearAlgebra.Transpose{Float64, SparseMatrixCSC{Float64, Int64}} with 4 stored entries:
  ⋅         ⋅    ⋅        0.0265797
 0.502663   ⋅   0.836581   ⋅ 
 0.588429   ⋅    ⋅         ⋅ 

Hope that helps.

I’m not sure to follow the point you’re making. The argument was about whether there is a cost associated with the workaround to do CSR-like SpMVs. There seem to be three cases.

First, we have A::SparseMatrixCSC{Tv,Ti}, in which case you can do a CSR-like SpMV by doing B=sparse(transpose(A)); transpose(B)*x. Then, you are paying a cost when doing B=sparse(transpose(A)).

Second, if you know the COO representation of your array with I::Vector{Ti}, J::Vector{Ti} and V::Vector{Tv}, you can do B=sparse(J, I, V); transpose(B)*b, in which case you are paying a cost when doing B=sparse(J, I, V) as it induces a conversion from COO to CSC.

Finally, if you have a CSR representation of your matrix in the form of m::Int, n::Int, rowptr::Vector{Ti}, colval::Vector{Ti}, nzval::Vector{Tv}, you can do B=SparseMatrixCSC(m, n, rowptr, colval, nzval); transpose(B)*x, in which case there is no other cost than SpMV_T.

The point is that you could just define SparseMatrixCSR as a type alias for Transpose{SparseMatrixCSC}, together with some convenience constructors. There is no additional cost involved when constructing, i.e., CSR from COO has similar cost as CSC from COO and CSR from rowptr, colval and nzval is just like CSC from colptr, rowval and nzval as my example shows.

I think we’re on the same page. Let me see try to unpack it.

Suppose I have the following CSR representation.

julia> A
5×5 SparseMatrixCSR{1, Int8, Int64} with 10 stored entries:
  [1, 4]  =  78
  [1, 5]  =  -87
  [2, 1]  =  88
  [2, 3]  =  38
  [2, 5]  =  77
  [3, 5]  =  31
  [4, 1]  =  -93
  [5, 1]  =  -25
  [5, 2]  =  -20
  [5, 3]  =  23

julia> Matrix(A)
5×5 Matrix{Int8}:
   0    0   0  78  -87
  88    0  38   0   77
   0    0   0   0   31
 -93    0   0   0    0
 -25  -20  23   0    0

Construct the CSC matrix, B, as follows.

julia> B = SparseMatrixCSC(A.m, A.n, A.rowptr, A.colval, A.nzval)
5×5 SparseMatrixCSC{Int8, Int64} with 10 stored entries:
   ⋅  88   ⋅  -93  -25
   ⋅   ⋅   ⋅    ⋅  -20
   ⋅  38   ⋅    ⋅   23
  78   ⋅   ⋅    ⋅    ⋅
 -87  77  31    ⋅    ⋅

julia> B_T = transpose(B)
5×5 Transpose{Int8, SparseMatrixCSC{Int8, Int64}} with 10 stored entries:
   ⋅    ⋅   ⋅  78  -87
  88    ⋅  38   ⋅   77
   ⋅    ⋅   ⋅   ⋅   31
 -93    ⋅   ⋅   ⋅    ⋅
 -25  -20  23   ⋅    ⋅

Then we can reconstruct the matrix A in CSR form in the following way.

julia> A2 = SparseMatrixCSR(B_T)
5×5 SparseMatrixCSR{1, Int8, Int64} with 10 stored entries:
  [1, 4]  =  78
  [1, 5]  =  -87
  [2, 1]  =  88
  [2, 3]  =  38
  [2, 5]  =  77
  [3, 5]  =  31
  [4, 1]  =  -93
  [5, 1]  =  -25
  [5, 2]  =  -20
  [5, 3]  =  23

That just expands to the following.

julia> A3 = SparseMatrixCSR{1}(B.m, B.n, B.colptr, B.rowval, B.nzval)
5×5 SparseMatrixCSR{1, Int8, Int64} with 10 stored entries:
  [1, 4]  =  78
  [1, 5]  =  -87
  [2, 1]  =  88
  [2, 3]  =  38
  [2, 5]  =  77
  [3, 5]  =  31
  [4, 1]  =  -93
  [5, 1]  =  -25
  [5, 2]  =  -20
  [5, 3]  =  23

julia> A === A2 && A === A3
true

Then we calculate A*x via the following function which I think you would refer to as “CRC SpMV”.

function mul!(y::AbstractVector,A::SparseMatrixCSR,v::AbstractVector)
  A.n == size(v, 1) || throw(DimensionMismatch())
  A.m == size(y, 1) || throw(DimensionMismatch())
  fill!(y, zero(eltype(y)))
  o = getoffset(A)
  for row = 1:size(y, 1)
    @inbounds for nz in nzrange(A,row)
      col = A.colval[nz]+o
      y[row] += A.nzval[nz]*v[col]
    end
  end
  return y
end

Are we saying the same thing?

No. CSR from COO, CSR from CSC and CSC from COO all do have a cost associated with them which CSR from rowptr, colval, nzval and CSC from colptr, rowval, nzval don’t.

1 Like

I understand, but first, the cost for CSR from COO and CSC from COO are the same, and, secondly – and probably more important to you – CSR from rowptr, colval, nzval and CSC from colptr, rowval, nzval also have the same (much lower) cost.
Now, using the representation that I showed, the last call transpose(SparseMatrixCSC(n, m, rowptr, colval, nzval)) in my example constructs a CSR representation directly from rowptr, colval, nzval and without any additional cost (compared to constructing a CSC representation from colptr, rowval, nzval).

Yes, because at that stage you had the data structures of the CSR assembled, i.e., rowptr, colval and nzval. But if your starting point is a CSC matrix A::SparseMatrixCSC{Tv,Ti} or COO representation given by I::Vector{Ti}, J::Vector{Ti} and V::Vector{Tv}, like at the beginnig of your example, then you have some work to do to get to your CSR.

Ok, but as others have already explained as well, you have to construct your sparse matrix somehow in any case. So, why not start with rowptr, colval and nzval, if that’s what you want in the first place?

The example was just an illustration, I have no idea where the sparse matrix comes from in your case.

This thread is interesting, and made me learn a few things about CSR vs CSC.

@nvenkov1 , could you take a step back a talk a bit about your workflow? It’s not clear to me what kind of input do you have (COO? CSR? CSC? other? …), and what are your requirements.

In particular, why do you need your starting point to be CSR? (just to understand)

If mandatory (because … why not), then considering the memory layout is equivalent to transpose{SparseMatrixCSC}, why not just “cast” your data to that type?

Granted, as already mention, this would be easier with a thin wrapper that could worth a PR. But all in all, nothing insurmontable or critical, isn’t it?