CSC kills the prospect of multithreading. Shouldn't Julia use CSR?

If you do have the data structures of the CSR matrix, you should of course make use of it. But there are cases in which your sparse matrix comes from another Julia function, in which case it’s most likely going to be CSC. Examples are sprand and spzeros from SparseArrays.jl, mmread from MatrixMarket.jl and so forth.

My workflow is not so important as I only do prototyping work in Julia and do my real work in C. However, as I explained to @bertschi, most Julia functions which return sparse matrices do it in CSC. So there are numbers of cases in which you have no choice but to handle some CSC matrices in Julia.

The advantage of CSR over CSC and COO is that its corresponding SpMV can be parallelized, which is not the case of the SpMV of CSC and COO. Given that Julia’s BLAS enables multithreading, I was suprised that the choice of the default sparse matrix format was not done so as to eventually allow seamless parallelization of SpMV. Sure, as it’s been widely discussed in the thread, you can workaround the CSC functionalities of Julia and parallelize the CSC SpMV_T which you can use with a transposed CSC so as to emulate a multithreaded CSR SpMV, but just like @ChrisRackauckas said, I think would make more sense for SparseArrays.jl to support CSR.

Well to be fair, sprand and spzeros produce matrices that can be interpreted as CSR instead.

1 Like

Yes, sprand and spzeros are not the most relevant examples, but mmread is.

2 Likes

The thread is quite old right now, but I would like to add a comment. The prevalent argument repeated over and over in this thread is that the transpose of CSR matrix can be constructed as a CSC matrix and then wrapped in transpose.

The problem with this argument is that it burdens the user with unwanted mental overhead. You see, the easiest way to construct the transpose is to use COO representation and then transform to CSC, because we can simply swap the two indices in COO representation. However, when you create the CSC matrix from COO matrix, you double the memory required to store the matrices. If COO already takes more than half of the RAM (I am talking based off my experience, it can happen), you simply won’t be able to create a CSC matrix, so the only option is to create it directly, i.e. by filling the colptr, rowval and nzval.

Filling directly colptr, rowval and nzval brings by itself some level of mental overhead. If simultaneously you have to construct it not for the matrix, but for its transpose, the mental overhead is doubled. More mental overhead means more opportunity to make a mistake.

2 Likes

Wait, shouldn’t construction of CSR from CSC simply be “switch rows for columns”?
colptrrowptr, … That should be “for free”, shouldn’t it?

Edit: Symmetric matrix assumed.

If you happen to know the metadata of a CSC storage of your data which is actually stored in a CSR format, then indeed, you can process the data using the CSC built-in data structure as you say, in which case an SpMV is replaced by SpMV_T and so on. But if you natively resort to CSR, then you typically don’t know the metadata of a CSC storage for free.

Yes, as pointed out by your edit, in the special case of symmetric matrices, the stored non-zero values need not be reordered and you can promptly recover one format from the other, “for free”.