How to parallelize the product of sparse and dense matrices

This is a general question. Suppose we have a sparse CSR matrix A and a dense column majour matrix B, and we want to calculate the product.

In principle, one can parallelize either over the rows of matrix A, or over the columns of matrix B. Given the dimensions of the matrices and may be some additional information (sparsity of A, for example), which way is more efficient? Does anyone know, how it is done in MKL, for example?