Help with debugging some code for the MKLSparse package

On the principle that “given enough eyeballs, all bugs are shallow” I am asking for help in debugging a problem in the dmb/matrix_sparse branch of the MKLSparse package.

The purpose of this package is to use Intel’s Mathematics Kernel Library (MKL) for some BLAS (Basic Linear Algebra Subroutines) operations on sparse matrices, as defined in the SparseArrays package. Recent versions of MKL allow for a more sophisticated internal representation of different sparse matrix types and correspondingly simpler calling sequences in the so-called execution routines Constructors for the internal representation were discussed in Wraping a MKL handle in julia The purpose of this branch is to switch the package from an older formulation of the library to this newer approach.

Right now I have very simple routines for sparse-dense multiplication and solutions of triangular systems implemented. Everything is cool except for one seemingly simple operation - multiplication of a vector by a triangular sparse matrix. The failure shows up in the package tests. The first two testsets have all tests passing but the triangular testset has 90 tests passing and 8 failing. The failing tests are all on multiplication of a vector by a triangular matrix or its transpose. The passing tests include exactly the same calculation with the vector replaced by a dense 1-column matrix (reshape(b, (:, 1))) and by the same sparse matrix without the triangular property.

Properties like triangularity are passed as a matrix_descr struct which I am constructing here from Enum types to follow the pattern in C header file mkl_sblas.h. They could just as well be passed as an
NTuple{3, Cint}. I would believe that somehow I had botched those declarations except that the more difficult problem of solving a triangular sparse - dense system works. And multiplication by a triangular sparse matrix works if the right hand operand is a dense matrix instead of a dense vector, resulting in a call to mkl_sparse_d_mm instead of mkl_sparse_d_mv.

I have spent a couple of days trying to see what I am doing wrong and I can’t. Any insight into what is probably an embarrassingly stupid mistake on my part would be appreciated. Discussion can take place on

1 Like