Hi everyone!
I’m new to Julia, so the answer to my question is possibly trivial. I have an array A(n,m) in which I store indices that are used to iterate over other data structures (e.g., vectors, matrices,…). Since Julia starts indexing vectors with 1 (instead of 0 like in e.g, C++), I need to store many 1’s in A. In my case, A consumes much memory, but B = A(n,m) - ones(n,m) is sparse and consumes much less. I need to access these values often in my code. Clearly, I could add “+1” each time when accessing element B[i,j], but this is really a mess (for reading and debugging). Therefore, it would be convenient to store only the “non-ones” instead of the “non-zeros”.
Do you know an option to “shift” the base of a sparse array (as sketched above from 0 to 1), or at least an efficient (readable, debugging-friendly) work-around?
Although that almost certainly won’t be sufficient as it will hit the generic implementations of nearly every function you call — meaning that it’ll iterate over every position and not just the stored values with some cleverness to account for the non-stored values.
I don’t think I understand what you mean — AFAICT it will make a call to getindex, and the generated code looks equivalent to getindex(...) + 1. As for benchmarks,
using MappedArrays, SparseArrays, BenchmarkTools
B = spdiagm(0 => ones(Int, 10)
A = mappedarray(x -> x + 1, B)
ι = [CartesianIndex(rand(1:10), rand(1:10)) for _ in 1:1000]
f(A, ixs) = [A[ix] for ix in ixs]
f1(B, ixs) = [B[ix] + 1 for ix in ixs]
@btime f($A, $ι);
@btime f1($B, $ι);
My point is that folks working with SparseArrays tend to expect algorithms to be written with an O(nnz) implementation. E.g.:
julia> S = sprand(10^5,10^5, 10^-5)
100000×100000 SparseMatrixCSC{Float64,Int64} with 100180 stored entries:
julia> @time sum(S) # only does ~100,000 additions
0.000138 seconds (9 allocations: 272 bytes)
50084.55558203958
julia> @time sum(M) # does 100,000^2 additions
67.050491 seconds (192.65 k allocations: 9.212 MiB)
1.000005008455539e10
The MappedArray doesn’t know that it’s being backed by a sparse array that could potentially use these smarter algorithms. IIRC some of the sparse folks on the main Julia repository have been working towards making this work — that’s one of the reasons why we’ve slowly been moving to a stored vocabulary instead of nonzero.
I have for a long time wanted a sparse array type that stores its data as S + d where S is a classic zero-default sparse array and d is a constant default value. Then you can define operations like this:
This has the benefit that it’s closed under most algebraic operations, whereas zero-default sparse arrays are not. Investigating this kind of structure might make an interesting project—I suspect that many linear algebra operations could be implemented efficiently.
Thanks a lot, this sounds like a nice work-around. The downside is, however, that values in a mapped array cannot be changed according to the documentation. But I can deal with that.