This is a general question. Suppose we have a sparse CSR matrix `A`

and a dense column majour matrix `B`

, and we want to calculate the product.

In principle, one can parallelize either over the rows of matrix `A`

, or over the columns of matrix `B`

. Given the dimensions of the matrices and may be some additional information (sparsity of `A`

, for example), **which way is more efficient?** Does anyone know, how it is done in MKL, for example?