bmit
May 13, 2021, 1:19am
1
I have a large matrix x
. I’d like to create a sparse representation of the values that meet a certain criteria, e.g. xi < 1
.
I’d like to do something like the following:
x = rand(1000, 1000)
i = findall(xi -> xi < 0.01, x)
xs = sparse(x, i)
I wasn’t able to find a sparse constructor that used CartesianIndices
. What is the recommended way to do this? Ideally without copying values from x (though I’m not sure this would be faster).
This should work:
sparse(x .* (x .< 0.01))
Otherwise, if you want to go through the indices, then you could so something like
CI = findall(<(0.01), x)
using Unzip # not sure it's needed but I find it convenient
i, j = unzip(Tuple.(CI))
xs = sparse(i, j, x[CI], size(x)...)
bmit
May 13, 2021, 1:55am
3
Thanks for the lightning response and the slick one-liner!
Here’s a quick comparison of the hack I have been working with, your two approaches, and one i found here .
using BenchmarkTools, SparseArrays, Unzip
x = rand(1000,1000)
d = 0.01
@btime begin
m = x .< d
i,j, = findnz(m)
xs = sparse(i ,j, view(x, m), size(x)...)
end
# 1.489 ms (38 allocations: 851.00 KiB)
@btime xs = sparse(x .* (x .< d))
# 2.482 ms (12 allocations: 7.79 MiB)
@btime begin
CI = findall(x .< d)
i, j = unzip(Tuple.(CI))
xs = sparse(i, j, view(x, CI), size(x)...)
end
# 1.190 ms (45 allocations: 927.47 KiB)
@btime begin
CI = findall(x .< d)
CI′ = reinterpret(Int, reshape(CI, 1, :))
xs = sparse(view(CI′, 1, :), view(CI′, 2, :), view(x, CI), size(x)...)
end
# 1.044 ms (40 allocations: 617.20 KiB)
I did not make any promises on performance!
FWIW, I reckon doing findall(x .< d)
is actually better than findall(<(d), x)
. Maybe try to compare with this?
CI = findall(x .< d)
i, j = unzip(Tuple.(CI))
sparse(i, j, view(x,CI), size(x)...)
But I should let the performance pros chime in!
bmit
May 13, 2021, 2:20am
5
You’re right. Updated the above per your suggestions