Controlled decimation of heatmaps

proposal
plotting

#1

I have a feature idea/request and would like comments before attempting anything myself:
When using heatmap to display images that are larger than the pixel area, the data gets decimated by the backend in order to fit into the display. This is fine if your interested in big features.
However the data I frequently work with is small peaks against a large noisy background.
If the data then gets automatically decimated, it ends up being luck whether the peak is sampled or discarded and it changes as the display window gets resized (physically resized and/or adding plot labels etc.).
Thus ideally I want to be able to apply my own decimation to ensure that the peak is not discarded. I can think in other applications, if interested in minimums, the decimation would look different to ensure they are retained.

For a backend that generates an image file, I can try to work around this effect, by first doing my own decimation beforehand. However for an onscreen display, it becomes more problematic. Also if the backend allows pan and zoom, the decimated data is not what you want to display when the user starts zooming.

Thus my feature request would be a way to be able to apply a custom/controlled decimation to the data of a heatmap, with some sort of feedback/call back on viewport changes (only if using a backend with interactivity).

While at some level I feel this should be separate from a package like Plots, working on top of it, however I would still like to be able to have that as a subplot in a larger plot, where the others are not necessarily this type of decimated heatmap. I’m also not sure if the interactive feedback would be achievable if this functionality is in a library running on top of Plots. Is this where Plot recipes come in?

Or is it only Plotly(JS) providing this interactivity and I show work at that level instead of trying to use Plots abstraction?

Would something like this be useful to others?

Thoughts/suggestions?


#2

cf https://github.com/JuliaPlots/Plots.jl/issues/702


#3

Consider trying restrict from ImageTransformations.jl (which is automatically loaded and exported by using Images). While algorithmically it originates from operations you’d perform when solving PDEs, it happens to be a fast antialiasing thumbnailer.


#4

To illustrate the problem, running from within Jupyter:

using Plots, Interact, Images
@manipulate for x in 1000:1030
    img = abs(randn(3,5000))*0.05
    img[2,x] = 255
    heatmap(restrict(img,2))
end

Move the slider and see how the peak disappears and only shows for about 4 out of 10 positions (based on my screen resolution). (Without the restrict it is about 1 out of 10 positions).

I need to call restrict 3 times, to get it displaying almost always. (Even there it dissappeared for 1 position out of the 30 on a plotlyjs figure.):

 heatmap(restrict(restrict(restrict(img,2),2),2))

Using an image instead of the heatmap seems better, I can just barely makeout the line for all the positions (I’m guessing something similar to restrict’s anti-aliasing is hapenning automatically behind the scenes, but image is also much wider on the notebook) (It does fluctuate on for many positions is far from max brightness):

using Images, Interact, Colors
@manipulate for x in 1000:1030
    img = abs(randn(200,5000))*0.05
    img[:,x] = 255
    Gray.(img)
end

I’m trying to follow exactly how the restrict algorithm works, however I’m getting lost in all the macros. Anywhere that I can find the simplified (non-optimised) formula?


#5

I don’t see any widgets whenever I use @manipulate (I haven’t yet taken the time to track down why), so I couldn’t easily try your demo.

But as a matter of principle, if you want the peak to always show, then yes, you need to restrict enough times so that any later command that uses nearest-neighbor sampling doesn’t lose information. If your display area is 800 pixels wide and your original image is 5000 pixels wide, then you need ceil(log2(5000/800)) calls to restrict before passing it off to the display code.

It’s a little hard to find references on restriction and prolongation that doesn’t also go into a lot of detail about PDEs. The general idea is that restriction is the “opposite” (the adjoint operation) of prolongation, and that prolongation is just interpolation. Let’s say you have a coarsely sampled signal Y[1], Y[2], Y[3], …, and that you want to make a version that has twice the sampling density. You’d be inclined to try the following:

y[1] = Y[1]
y[2] = (Y[1] + Y[2])/2
y[3] = Y[2]
y[4] = (Y[2] + Y[3])/2
...

Now rewrite those equations as y = P*Y for a matrix P that could be called the “prolongation operator”:

P = [1   0   0 ...;
     1/2 1/2 0 ...;
     0   1   0 ...;
     0   1/2 1/2 ...;
     ...]

The “restriction operator” is defined as R = cP' for some scalar constant c. If you write this case out explicitly, then you get that restriction from a fine-scale signal z to a coarse-scale signal Z should be defined as

Z[1] = z[1] + z[2]/2
Z[2] = z[2]/2 + z[3] + z[4]/2
Z[3] = z[4]/2 + ...
...

(here I’ve set c=1; if memory serves, the restrict code uses c=1/2). If this still isn’t clear, I’d recommend writing out these formulas yourself using a few more points so that you see the pattern.

This formula is appropriate when the fine-scale signal has an odd number of points. If the fine-scale signal has an even number of points, then you interpolate on 1/4 and 3/4 grid points. That leads to corresponding changes in the formula for restrict.

All the macros are there simply to enable it to work efficiently for any specific chosen dimension.

Is that enough to help you understand how the algorithm works?


#6

Ok, thanks, close to what I guessed from the code.
In other words, convolution of a length 3 smoothing kernel [0.5 1.0 0.5] (or FIR filter) with the data (linear operator), followed by factor 2 decimation.
I just wasn’t sure if there was some IIR effect and/or longer length involved in the smoothing.

Interpreted another way, a single sample in the original, will affect 1 or 2 samples in the output. If it was guaranteed to be 2, then if the output resolution is less than 2x the screen pixels, then it should still be fine (even if every second output sample is not displayed). However with some only affecting 1 output sample, any screen decimation might still result in it being dropped.

What I want to do, is figuring out what the ratio of input samples to screen pixels is and then setting up a smoothing kernel of atleast that length. Potentially for my case, I would like a non-linear higher order summation (see higher order norms), to allow higher values to dominate, instead of being diluted so much (in the extreme just taking the sliding max over that kernel length).

Technically my challenge is just:
-getting feedback on the figure pixel resolution
-find a mechanism to get updates based on pan/zoom changes, triggering a refresh of the display data
-A nice to have extra for later would be able to do better than just max, since max raises the noise floor as well, should try to somehow maximise contrast.
-Is there a better backend than plotlyjs, that can support pan,zoom, colorbar for a heatmap (or image)?

And on a generalisation level:
-Interest from others, or is this not a problem for many other domains?
-What flexibility would be required from this in order to be useful? Is there a need for minimum as well?
-What about pure averaging/smoothing for other applications, especially since heatmaps in Plotlyjs seems to get very slow with large amounts of pixels. Or is that a problem on my system setup and not a general issue?


#7

If you want interactivity with a full plotting framework, I think GLVisualize/GLPlot is the way to go. I believe it already has support for these things, see the GLPlot/examples/imageview.jl example.

If you’re mostly looking at raw images and don’t need a full plotting framework, GtkReactive is much lighter, has good support for pan and zoom, and doesn’t depend on OpenGL. The examples/imageviewer.jl might be helpful. At a minimum you’d probably also want to call init_pan_drag (this is just 1 extra line).

I’m in the process of rewriting ImageView to use GtkReactive. Among other things this will preserve pixel aspect ratios and offer a contrast GUI. (The current ImageView does these things already, but the pan/zoom support is not as flexible.) But I’m doing this partially to provide flexible short-term options; long-term, I think GLVisualize is the way to go.


#8

Why (read as: i don’t disagree, but some reasoning should be here)?


#9

GLVisualize is built, from the ground up, to emphasize interactivity. That’s very different from a framework that emphasizes the production of publication-quality plots (which is a great thing) but where interactivity is basically an add-on.

The fact that it has very high performance doesn’t hurt either, and reflects its focus on supporting interactive visualization.


#10

I should have also added a couple of other points:

  • ImageFiltering’s mapwindow can handle nonlinear filters. It’s designed to produce one output per pixel, though, so it’s not immediately applicable to this problem. Still, you might find some inspiration.
  • If you come up with some useful decimation tools, adding them to imresize (ImageTransformations) would be appreciated

#11

I’ll go look at GLVisualize in detail again. Last had a look in 0.4.6 and seemed very unstable at that time.
I forgot about it, due to not being in the Plots documentation backends table (in the interactive list).
Also the more recent example of it I saw was used for showing a 3D object without any axis etc. (I didn’t really want to have to build all of that infrastructure from scratch).
However first glance at the GitHub home page, looks very promising. Thanks Tim, this looks like it might be a much better focus than either trying a sluggish Plotlyjs and/or wrapping a completely new framework from scratch.

Agreed that optimising design for interaction versus publication can have huge implications.