Huge sparse array construction

If you have 217778220 nonzero entries, then just storing the values (as 64-bit floating-point numbers) and indices (as 64-bit integers) requires

3 * 217778220 * 8 / 2^30

GiB, or 4.87GiB. So, the memory usage of 6.5GiB is not that surprising (that is, it indicates that there is a 34% overhead of allocated memory in the process of constructing the sparse-matrix data structure, which is not particularly large). You could save a bit of memory (33%) by using 32-bit integers for the indices, but fundamentally you will need several GiB for this matrix (unless there is some other structure you can exploit besides sparsity).

As @pablosanjose suggests, if you generate the CSC sparse format directly, instead of (I,J,V) data, then you can avoid allocating the sparse data twice so you can save a factor of 2 (or a little more, since the overhead goes away), and it will be faster.

6 Likes