Skip empty entries of sparse matrix when plotting a heatmap

I want to plot a sparse matrix as a heatmap, something like this:

julia> using Plots, SparseArrays

julia> x = sprand(Float64, 5, 5, 0.5)
5×5 SparseMatrixCSC{Float64, Int64} with 11 stored entries:
  ⋅          ⋅         ⋅         ⋅         ⋅ 
  ⋅         0.838339  0.330949   ⋅         ⋅ 
  ⋅         0.456005  0.704216   ⋅         ⋅ 
 0.0524143  0.622022  0.319052  0.406603   ⋅ 
 0.356594    ⋅        0.787174  0.693707   ⋅ 

julia> heatmap(x)

which gives:

The issue is that the non-existing values are treated as zeros, and I want them to be treated as missing, which would give:

julia> y = Matrix(x); z = @. ifelse(y == 0, missing, y)
5×5 Matrix{Union{Missing, Float64}}:
  missing    missing   missing   missing  missing
  missing   0.838339  0.330949   missing  missing
  missing   0.456005  0.704216   missing  missing
 0.0524143  0.622022  0.319052  0.406603  missing
 0.356594    missing  0.787174  0.693707  missing

julia> heatmap(z)

Can anyone see a smart way of achieving this without creating a dense matrix as an intermediate?

Is there a strong reason to avoid this? The memory usage for heatmap is probably going to be proportional to the size of a dense matrix anyway.

To ensure that you only avoid plotting “structural” zeros, you can use findnz when creating the dense matrix:

X = sprand(Float64, 5, 5, 0.5)
I, J, V = findnz(X)
A = fill(NaN, size(X))
setindex!.((A,), V, I, J)

(here I used NaN instead of missing to avoid having to deal with Union types).

If the matrix is so huge that allocating a dense version is a problem, then probably you don’t want heatmap — you want some kind of smart downsampling algorithm, as discussed in Plotting image data in Julia is much slower than MATLAB - #17 by stevengj

1 Like

Plots.spy seems to not allocate, although its a bit difficult to confirm because @edit spy(x) does not really lead to anywhere useful.

julia> using BenchmarkTools, Plots, SparseArrays

julia> x = sprand(Float64, 500, 500, 0.01)
julia> @benchmark Plots.spy(x)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  164.625 μs …  23.158 ms  ┊ GC (min … max):  0.00% … 97.84%
 Time  (median):     199.167 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   231.186 μs ± 569.189 μs  ┊ GC (mean ± σ):  12.81% ±  5.26%

             ▁▃▃▅▄▅▆▇██▇▇█▆▄▅▄▃▂▁▁                               
  ▂▂▂▁▂▃▄▅▆▅██████████████████████▇▇▆▅▅▅▄▅▃▃▃▃▃▃▃▃▂▂▃▂▂▂▂▂▂▂▂▂▂ ▅
  165 μs           Histogram: frequency by time          260 μs <

 Memory estimate: 296.45 KiB, allocs estimate: 1286.

julia> x_alloc = Matrix(x)
julia> julia> @benchmark Plots.spy(x_alloc)
BenchmarkTools.Trial: 7196 samples with 1 evaluation per sample.
 Range (min … max):  300.791 μs … 51.989 ms  ┊ GC (min … max):  0.00% … 98.56%
 Time  (median):     465.625 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   693.803 μs ±  1.509 ms  ┊ GC (mean ± σ):  31.62% ± 16.23%

  ▃█▃                                                          ▁
  ███▆▁▃▁▃▁▄▃▁▄▄▃▁▄▃▁▁▁▁▁▁▁▁▁▁▃▄▆▆▅▃▄▆▄▅▅▅▅▅▆▆▆▅▅▆▅▆▅▄▄▅▄▅▄▅▅▅ █
  301 μs        Histogram: log(frequency) by time      7.06 ms <

 Memory estimate: 2.28 MiB, allocs estimate: 1284.

The output of spy is a bit different than that of heatmap, but when fiddling with the markersize and permuting the matrix you could probably approximate it.

1 Like

Yes, probably. For the moment I added a safeguard to just avoid plotting. But I think that “heatmap” using a memory proportional to the size of the dense matrix is an implementation detail. For instance, a scatter plot can easily look pretty much the same and use only the actual values for plotting.

Uhm, I didn’t know about spy. That seems a reasonable alternative, if tuned. Thanks!