# Shifting the base value of sparse arrays from 0 to 1

Hi everyone!
I’m new to Julia, so the answer to my question is possibly trivial. I have an array A(n,m) in which I store indices that are used to iterate over other data structures (e.g., vectors, matrices,…). Since Julia starts indexing vectors with 1 (instead of 0 like in e.g, C++), I need to store many 1’s in A. In my case, A consumes much memory, but B = A(n,m) - ones(n,m) is sparse and consumes much less. I need to access these values often in my code. Clearly, I could add “+1” each time when accessing element B[i,j], but this is really a mess (for reading and debugging). Therefore, it would be convenient to store only the “non-ones” instead of the “non-zeros”.

Do you know an option to “shift” the base of a sparse array (as sketched above from 0 to 1), or at least an efficient (readable, debugging-friendly) work-around?

eg a `MappedArray`:

``````julia> using MappedArrays, SparseArrays

julia> mappedarray(x -> x + 1, spdiagm(0 => ones(3)))
3×3 mappedarray(x->x + 1, ::SparseMatrixCSC{Float64,Int64}) with eltype Float64:
2.0  1.0  1.0
1.0  2.0  1.0
1.0  1.0  2.0
``````

Although that almost certainly won’t be sufficient as it will hit the generic implementations of nearly every function you call — meaning that it’ll iterate over every position and not just the stored values with some cleverness to account for the non-stored values.

I don’t think I understand what you mean — AFAICT it will make a call to `getindex`, and the generated code looks equivalent to `getindex(...) + 1`. As for benchmarks,

``````using MappedArrays, SparseArrays, BenchmarkTools

B = spdiagm(0 => ones(Int, 10)
A = mappedarray(x -> x + 1, B)
ι = [CartesianIndex(rand(1:10), rand(1:10)) for _ in 1:1000]

f(A, ixs) = [A[ix] for ix in ixs]
f1(B, ixs) = [B[ix] + 1 for ix in ixs]

@btime f(\$A, \$ι);
@btime f1(\$B, \$ι);
``````
``````julia> @btime f(\$A, \$ι);
@btime f1(\$B, \$ι);
4.664 μs (3 allocations: 7.98 KiB)

julia> @btime f1(\$B, \$ι);
4.194 μs (3 allocations: 7.98 KiB)
``````

My point is that folks working with SparseArrays tend to expect algorithms to be written with an O(nnz) implementation. E.g.:

``````julia> S = sprand(10^5,10^5, 10^-5)
100000×100000 SparseMatrixCSC{Float64,Int64} with 100180 stored entries:

julia> @time sum(S) # only does ~100,000 additions
0.000138 seconds (9 allocations: 272 bytes)
50084.55558203958

julia> @time sum(M) # does 100,000^2 additions
67.050491 seconds (192.65 k allocations: 9.212 MiB)
1.000005008455539e10
``````

The MappedArray doesn’t know that it’s being backed by a sparse array that could potentially use these smarter algorithms. IIRC some of the sparse folks on the main Julia repository have been working towards making this work — that’s one of the reasons why we’ve slowly been moving to a `stored` vocabulary instead of `nonzero`.

Generally yes, but in this particular case I am under the impression that sparsity just helps with storage.

1 Like

I have for a long time wanted a sparse array type that stores its data as `S + d` where `S` is a classic zero-default sparse array and `d` is a constant default value. Then you can define operations like this:

``````(S1 + d1) + (S2 + d2) = (S1 + S2) + (d1 + d2)
(S1 + d1) * (S2 + d2) = (S1*S2 + d1*S2 + S1*d2) + (d1*d2)
``````

This has the benefit that it’s closed under most algebraic operations, whereas zero-default sparse arrays are not. Investigating this kind of structure might make an interesting project—I suspect that many linear algebra operations could be implemented efficiently.

6 Likes

Thanks a lot, this sounds like a nice work-around. The downside is, however, that values in a mapped array cannot be changed according to the documentation. But I can deal with that.

1 Like

You are right, it is because of the storage. The array can get huge for large instances.

1 Like

It’s not only for a sparse array but I have a proof-of-concept implementation of a matrix type that lazily operates `A + B` for any matrix types `A` and `B`. Special handling of sparse `A` and “uniform” `B` should be possible. But for this to be useful we need the `gemm`-like multiply-add API in LinearAlgebra.jl: https://github.com/JuliaLang/julia/issues/23919