# Index type of sparse matrix

I’m using `Arpack.jl` to get the lowest few eigenvalues of relatively large `N x N` matrices. To achive this, I’m first cal `sparse(row, col, data)` to get the matrix to the appropriate form. Although the matrices I’m working on are large, `N` should be representable with `UInt32`, i.e., smaller than `2^32`, so this is what I’m using as the type of `row` and `col` to save some memory.

So far so good - but I get problems when the number of entries in my matrix is larger than `2^32`. In this case I get an error saying that the IndexType is too small and I should use something larger. They way I understand this is because the IndexType is automatically inferred from the provided arrays `row` and `col`. Straightforwardly, this can be fixed by simply using `UInt64` or any larger type - but this requires quite a bit more memory because instead of two lists of `UInt32` I now have to store two long lists of a larger type. Is there a way to specify the IndexType separately? If not: what is the (technical?) reason for this type of implementation?

Or, perhaps, did I understand something completely wrong here? Any help is appreciated - thanks!

I think you might have a point here. There seems to be an implicit assumption that the index of the `rows` and `cols` vectors is of the same type as the elements of these vectors (the type parameter `Ti`). However, unless this is a hard requirement for some reason that I don’t see, this seems to be a restriction as the index space is quadratic in `N`, i.e. `N * N`, while the element space is linear in `N`.

Let’s set up an example for `UInt8` instead of `UInt32`(without loss of generality):

``````using SparseArrays

N = 100
nz = Int(typemax(UInt8) + 10)   # equals 265

rows = rand(UInt8(1):UInt8(N), nz)
cols = rand(UInt8(1):UInt8(N), nz)
nzvals = rand(nz)
``````

The “constructor” you mention errors:

``````julia> S = sparse(rows,cols,nzvals)
ERROR: ArgumentError: the index type UInt8 cannot hold 265 elements; use a larger index type
Stacktrace:
[1] sparse(::Array{UInt8,1}, ::Array{UInt8,1}, ::Array{Float64,1}, ::Int64, ::Int64, ::Function) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/SparseArrays/src/sparsematrix.jl:677
``````

I tried creating the `SparseMatrixCSC` manually, which seems to work at first (doesn’t error):

``````S = SparseMatrixCSC{eltype(nzvals), eltype(rows)}(N,N,cols,rows,nzvals)
``````

However, operations on this object are flawed. For example, `nnz(S)` gives some number that doesn’t appear in the source code, i.e. `44` in my case. Trying to count the non zero elements manually via

``````function count_nz(S)
c = 0
for (i, v) in pairs(S)
if v != zero(v)
c+=1
# @show (i,v)
end
end
return c
end
``````

I get 68 which, again, isn’t `nz = 265`. (Note that I don’t use `count(!=(0.0),S)` because it is explicitly overloaded for `SparseMatrixCSC` and just gives the same result as `nnz(S)`.

Note that I don’t get any error but just wrong/unexpected results.

1 Like

@viralbshah Don’t you have some deeper experience with `SparseMatrixCSC`?

The index type in the CSR/CSC storage format must be able to represent the number of non-zeros.

1 Like

Good to know. Do you have a source for this that you could point me to?

In any case, it seems that “index type” is somewhat misleading. There are two different kinds of “indices” which scale very differently as a function of `N`. Maybe this restriction could just be lifted?

BTW, the manual (source: Sparse Arrays · The Julia Language) only mentions the “linear” indices, i.e. the elements of `rows` and `cols`

`Ti` is the integer type for storing column pointers and row indices

and does not mention that

CSR/CSC format on Wikipedia for example, but the example is “bad” since they have 4 non-zeros and a 4x4 matrix.

Right, it can probably be explained better, but note that it says

`Ti` is the integer type for storing column pointers and row indices

so this is not column indices and row indices, it is column pointers and row indices.

Here is a concrete example:

``````julia> S = sparse([1 0 0;
3 0 4;
0 5 6.]);

julia> S.rowval
5-element Vector{Int64}:
1
2
3
2
3

julia> S.colptr
4-element Vector{Int64}:
1
3
4
6

julia> S.rowval
5-element Vector{Int64}:
1
2
3
2
3
``````

where `S.colptr[col]` is the index into `S.rowval` for the first value in column `col`:

``````julia> S.colptr[2]
3

julia> S.nzval[S.colptr[2]]
5.0
``````

so values in `S.colptr` must be able to point to the last element in `S.nzval`, ie it must be able to represent the number of non-zeros (+1).

For the COO format it would indeed be enough to represent up to `max(N,M)` where for a matrix with size NxM.

2 Likes

If I understand it correctly one could technically store `S.rowval` as a different type - although this would only give a memory advantage of 25% when using half the precision. As is, though, this is not implemented, right?

Thank you very much for the clarification - I think that settles it.

Correct.

1 Like