As in the title, I mean every block is dense, but there are only a small percentage of non-vanishing blocks.
Here you are
(Just search for the keyword “block matri” here)
Thanks, but this one is actually the opposite of what I want, unfortunately. It can define block matrices, in which each block can be sparse. In my case, each block is dense.
Is something like this what you want:
Introduction · BlockSparseMatrices.jl Or CompressedSparseBlocks.jl?
Note searching for “block matrix” gave many results, unlike “block matri” only one at JuliaHub. and seemingly better without a hyphen.
What kind of storage format do you want? BSR seems to be it or BCSR, or even consider Blocked-Ellpack format:
Do you have some kind of structure? E.g. are there as many blocks per row? It seems better for [blocked] Ellpack:
The Sliced Ellpack format is standardized and well-known as the state of the art. ..
[different section below]
The Blocked Ellpack format is similar to the standard Ellpack, where the column indices represent two-dimensional blocks instead of a single matrix entry.
Do you want to run on a GPU or not? Then consider cuSPARSE, but I also see cuSPARSELt I wasn’t familiar with. You have access to the former, and seemingly also some access to the latter with CUDA.jl:
https://www.osti.gov/servlets/purl/1366653
We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block sub-structure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. .. We give a set of algorithms that performs SpMV up to 4x faster than the NVIDIA cuSPARSE cusparseDbsrmv routine, up to 147x faster than the Intel Math Kernel Library (MKL) mkl dbsrmvroutine (a single-threaded BCSR SpMV kernel), and up to 3x faster than the MKL mkl dcsrmv routine (a multi-threaded CSR SpMV kernel).
What’s your application? ML/LLMs then see also:
One of the most challenging research questions around sparse MoE is therefore how to run it efficiently on machines that were never intended to run sparse computations in the first place.
This week, we’ll meet MegaBlocks (Gale et al 2022), the perhaps most efficient and scalable sparse MoE implementation in the industry today. The key idea is to re-formulate sparse MoE as a single block-sparse matrix multiplication instead of multiple dense matrix multiplications by leveraging a data format specifically designed to store and manipulate sparse matrices.
§5.1.3 describes our hybrid blocked-CSR-COO sparse matrix format, which enables efficient matrix products with sparse input and output operands.
Is it matrix-matrix multiply yoy need, and then both same format or one of dense? Or matrix-vector multiply?