Why are duplicate entries in sparse matrices allowed?

Why does the representation of sparse matrices allow for duplicates?

a = spzeros(2, 3)
a[[1,1],2]=1
a[[1],2]=2

Leads to

2×3 sparse matrix with 2 Float64 nonzero entries:
	[1, 2]  =  2.0
	[1, 2]  =  1.0

This behavior seems prone to computation errors. Why is this allowed? What is the use case?

3 Likes

just writes a 2 into a slot where a 1 was before. This is normal for any array, no?

yes - but the internal representation of sparse matrix is now keeping two values for the same entry

2×3 sparse matrix with 2 Float64 nonzero entries:
	[1, 2]  =  2.0
	[1, 2]  =  1.0
julia> a = spzeros(2, 3)
2×3 SparseMatrixCSC{Float64,Int64} with 0 stored entries

julia> a[[1,1,1,1,1,1],2]=1
1

julia> a
2×3 SparseMatrixCSC{Float64,Int64} with 6 stored entries:
  [1, 2]  =  1.0
  [1, 2]  =  1.0
  [1, 2]  =  1.0
  [1, 2]  =  1.0
  [1, 2]  =  1.0
  [1, 2]  =  1.0

Looks likes a bug to me. It should remove duplicates.

Ah, yes, you’re right! Please file an issue (if there is non yet).

I googled a little and it seems python and matlab do the same… I’m not sure if there are performance issues with removing duplicates. But allowing duplicates can cause many unwanted bugs in one’s code. Since this is not allowed in full matrices, I see no reason why it would make sense in sparse.

Sparse matrices are part of base… Is the main JuliaLang repo the place to file an issue?

I just wanted to make sure, it’s in fact an issue and not a know behavior

It is definitely a bug since other methods will compute incorrectly on this malformed matrix:

julia> a
2×3 SparseMatrixCSC{Float64,Int64} with 4 stored entries:
  [1, 2]  =  2.0
  [2, 2]  =  2.0
  [2, 2]  =  2.0
  [2, 2]  =  2.0

julia> countnz(a)
4

Yes, the julia main repo is the correct place to report the issue.

Maybe this relates to duplicates in sparse matrices - how they are treated now:

Per default they are summed up

julia> A = sparse([2,2,2,2], [1,1,1,1], [2,1,3,4])
2×1 SparseMatrixCSC{Int64, Int64} with 1 stored entry:
  ⋅
 10

or aggregated using the function chosen as combine argument:

julia> A = sparse([2,2,2,2], [1,1,1,1], [2,1,3,4], 2, 1, max)
2×1 SparseMatrixCSC{Int64, Int64} with 1 stored entry:
 ⋅
 4

if only first (or last) argument shall be used use (a,b) -> a or (a,b), -> b

julia> A = sparse([2,2,2,2], [1,1,1,1], [2,1,3,4], 2, 1, (a,b) -> a)
2×1 SparseMatrixCSC{Int64, Int64} with 1 stored entry:
 ⋅
 2