Index type of sparse matrix

I’m using Arpack.jl to get the lowest few eigenvalues of relatively large N x N matrices. To achive this, I’m first cal sparse(row, col, data) to get the matrix to the appropriate form. Although the matrices I’m working on are large, N should be representable with UInt32, i.e., smaller than 2^32, so this is what I’m using as the type of row and col to save some memory.

So far so good - but I get problems when the number of entries in my matrix is larger than 2^32. In this case I get an error saying that the IndexType is too small and I should use something larger. They way I understand this is because the IndexType is automatically inferred from the provided arrays row and col. Straightforwardly, this can be fixed by simply using UInt64 or any larger type - but this requires quite a bit more memory because instead of two lists of UInt32 I now have to store two long lists of a larger type. Is there a way to specify the IndexType separately? If not: what is the (technical?) reason for this type of implementation?

Or, perhaps, did I understand something completely wrong here? Any help is appreciated - thanks!

I think you might have a point here. There seems to be an implicit assumption that the index of the rows and cols vectors is of the same type as the elements of these vectors (the type parameter Ti). However, unless this is a hard requirement for some reason that I don’t see, this seems to be a restriction as the index space is quadratic in N, i.e. N * N, while the element space is linear in N.

Let’s set up an example for UInt8 instead of UInt32(without loss of generality):

using SparseArrays

N = 100
nz = Int(typemax(UInt8) + 10)   # equals 265

rows = rand(UInt8(1):UInt8(N), nz)
cols = rand(UInt8(1):UInt8(N), nz)
nzvals = rand(nz)

The “constructor” you mention errors:

julia> S = sparse(rows,cols,nzvals)
ERROR: ArgumentError: the index type UInt8 cannot hold 265 elements; use a larger index type
Stacktrace:
 [1] sparse(::Array{UInt8,1}, ::Array{UInt8,1}, ::Array{Float64,1}, ::Int64, ::Int64, ::Function) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/SparseArrays/src/sparsematrix.jl:677

I tried creating the SparseMatrixCSC manually, which seems to work at first (doesn’t error):

S = SparseMatrixCSC{eltype(nzvals), eltype(rows)}(N,N,cols,rows,nzvals)

However, operations on this object are flawed. For example, nnz(S) gives some number that doesn’t appear in the source code, i.e. 44 in my case. Trying to count the non zero elements manually via

function count_nz(S)
    c = 0
    for (i, v) in pairs(S)
        if v != zero(v)
            c+=1
            # @show (i,v)
        end
    end
    return c
end

I get 68 which, again, isn’t nz = 265. (Note that I don’t use count(!=(0.0),S) because it is explicitly overloaded for SparseMatrixCSC and just gives the same result as nnz(S).

Note that I don’t get any error but just wrong/unexpected results.

1 Like

@viralbshah Don’t you have some deeper experience with SparseMatrixCSC?

The index type in the CSR/CSC storage format must be able to represent the number of non-zeros.

1 Like

Good to know. Do you have a source for this that you could point me to?

In any case, it seems that “index type” is somewhat misleading. There are two different kinds of “indices” which scale very differently as a function of N. Maybe this restriction could just be lifted?

BTW, the manual (source: Sparse Arrays · The Julia Language) only mentions the “linear” indices, i.e. the elements of rows and cols

Ti is the integer type for storing column pointers and row indices

and does not mention that

CSR/CSC format on Wikipedia for example, but the example is “bad” since they have 4 non-zeros and a 4x4 matrix.

Right, it can probably be explained better, but note that it says

Ti is the integer type for storing column pointers and row indices

so this is not column indices and row indices, it is column pointers and row indices.

Here is a concrete example:

julia> S = sparse([1 0 0;
                   3 0 4;
                   0 5 6.]);

julia> S.rowval
5-element Vector{Int64}:
 1
 2
 3
 2
 3

julia> S.colptr
4-element Vector{Int64}:
 1
 3
 4
 6

julia> S.rowval
5-element Vector{Int64}:
 1
 2
 3
 2
 3

where S.colptr[col] is the index into S.rowval for the first value in column col:

julia> S.colptr[2]
3

julia> S.nzval[S.colptr[2]]
5.0

so values in S.colptr must be able to point to the last element in S.nzval, ie it must be able to represent the number of non-zeros (+1).


For the COO format it would indeed be enough to represent up to max(N,M) where for a matrix with size NxM.

2 Likes

If I understand it correctly one could technically store S.rowval as a different type - although this would only give a memory advantage of 25% when using half the precision. As is, though, this is not implemented, right?

Thank you very much for the clarification - I think that settles it.

Correct.

1 Like