Image resizing while preserving average

Hello! I am new to Julia and need some help in image processing.

The arrays which I am working with are histogram 2D probability mass functions, so maintaining a sum of 1 in the image is important. What’s interesting, is that I have noticed that the ImageTransformations package’s imresize function does not preserve the average of the pixels, which is the behavior I would expect. For example, imagine you have a 8x8 image, and you want to resize to a 2x2 image. I would think that this would simply average the values in each 4x4 block to produce each pixel in the target image. However, this average is only preserved when down sampling by a factor of 2.

Here’s a MWE:

> a = rand(Normal(), (8,8))
> b = imresize(a, (2,2))
> c = imresize(a, (4,4))
> d = imresize(c, (2,2))
> mean(a)
x
> mean(b)
y
> mean(c)
x
> mean(d)
x

Is this expected behavior? I can’t figure out exactly how this works. I know that this is simple enough to implement, but I’ll be doing resizings that are not even multiples and will require interpolation that still preserves the average, in order to correctly scale values to maintain the probabilistic nature of the array.

Are there any existing solutions?

Image resizing is actually surprisingly complicated, both from a performance perspective and an accuracy perspective. I’ve taken a couple of computer vision classes, but I suspect I would get it wrong a few times if I had to implement it myself. You can see the implementation of imresize here: https://github.com/JuliaImages/ImageTransformations.jl/blob/master/src/resizing.jl including some helpful documentation about what it’s trying to do.

In answer to your question, I suspect the easiest way to handle this would be to record the previous average, perform the resize, and then add an offset to the resized image to fix its mean. E.g.

julia> a = randn(8, 8);

julia> b = imresize(a, (2, 2));

julia> b .+= mean(a) - mean(b)
2×2 Array{Float64,2}:
 0.189495  -0.207325
 0.744529   0.351593

julia> mean(b) ≈ mean(a)
true

Does that work for your case?

By the way, you can format your code nicely using the instructions at PSA: how to quote code with backticks

1 Like

Thanks so much for the reply and the formatting tip!

The reason this wouldn’t work for my case is because this would (most likely) not preserve the relative values at each pixel location. The desired end result is an array with sum=1, and so one solution is to simply scale the array by 1/sum(image), however this doesn’t preserve this information as well. The only case in which this would be correct would be if the difference in the mean was the sum of small differences contributed equally by each pixel.

this is easy to fix post resize, move on

what does that mean?
say you have [0.1, 0.3, 0.4 0.2] resizing to 2 numbers, you would expect to become [0.4, 0.6]? but 0.6 - 0.4 != 0.4 - 0.3? (or, 50% != 33%

No, I would expect [0.1, 0.3, 0.4 0.2] resizing to 2 numbers to average the values in the area and become [0.2, 0.3] in which case scaling the output array by the ratio of pixels 4/2 would result in the array [0.4, 0.6]. This scenario works, because in resizing the behavior is as desired (the average value is preserved).

What’s important in the end is summing up the values from each pixel in the source image that contribute to the pixel in the target image because this is a histogram. So, imagine 2 pixels represent the number of observations in the range 0-.5 and .5-1 respectively. I want to change the range of these bins to be multiples of 1 rather than .5. The number of observations in the new 0-1 bin should be the sum of the 2 bins (pixels) above.

So imagine a scenario where the average pixel value after resizing is not the same as the original average. More concretely. I have 64 bins (8x8 image). I want to resize to 4 bins (2x2 image, with bins that are 16 times as “wide”). If the average pixel value is preserved, than I can just scale the resized image by 64/4 = 16 and the total number of observations will be preserved, in other words sum(original image) = sum(scale*resized_image). However, if this average is not preserved, which is what I am experiencing, then after scaling by 16, the total number of events has changed.

If I used the strategy above, I would offset each pixel (bin) by a number. Not only a number, but the same number at each location. Yes, this does allow us to maintain the total number of events, but this implies that the number of events that were lost (or gained) during resizing was evenly distributed across all pixels, because we are offsetting each pixel equally, which I do not believe is the case. We now have an image that may have the correct total number of events, but the number of events in each bin is incorrect. Does this issue make sense?

1 Like

yes, this makes sense but then the following two things can’t be true at the same time right?

I think what you want is not re-binning because that preserves the event count, you want merge pixels and doing n-neighbor average am I understanding this correctly?

Sorry, the reason the two statements are both true is because the sum of the original image is 1 because it is a probability mass function.

I ultimately want re-binning, but I thought imresize did a sort of rebinning and then averaging (so that images that are actually photographs do not just oddly get brighter). Also, I’d like for this to work (reasonably well) when the new dimensions do not evenly divide the old.

I see what you mean. Right, ultimately since the original information is lost when making them into a hist2d, any non-integer re-binning would require interpolation. If I’m not mistaken, this is closest to what you want:
https://scikit-image.org/docs/dev/api/skimage.transform#skimage.transform.downscale_local_mean

Right, thanks for the discussion, this seems as close to what I want as I can get perhaps

As @rdeits says, it’s a little more complicated than it seems. The thing that makes “rebinning” undesirable for many applications is the potential for producing Moire patterns. An anti-aliasing version of rebinning can be found in restrict, which does 2x reduction. Except for the edges of the image, restrict should do 2-fold size reduction in a manner that preserves the mean intensity. It’s also insanely fast on really big images, and it’s my main goto (far more than resize) when I want to change resolution.

For the more generic resize-by-any-factor: while it’s a decision that could be reconsidered, we decided to keep operations orthogonal and transparent, so imresize is really just an interpolation operation. You’ll note the docs say this:

If you are shrinking the image, you risk aliasing unless you low-pass filter img first.

and they illustrate how you might combine the two operations. That would get much closer to preserving the mean.

1 Like

It’s worth noting that there are two different perspectives that have sometimes incompatible behaviors:

Image as 2D Histogram

If I understand correctly you’re looking at the image as a 2D histogram, so each pixel is a box around a set of events, and resizing those boxes shouldn’t change the total number of events. Making the boxes bigger by an integer factor (e.g. from 8x8 to 4x4) seems well-defined, because you can sum collections of adjacent boxes. Resizing in the other direction seems tough because you don’t know how many of the original events were in each region of the larger boxes.

Image as Samples

In a DSP context the pixels aren’t boxes, they’re points representing samples of an underlying continuous 2D image. The continuous signal is assumed to be bandlimited (it’s “smooth” such that it doesn’t wiggle faster than the samples can represent).

Resampling in this context is is a matter of sampling the continuous surface at different points. So if you’re upsampling by 2 then you’re sampling the continuous function between the existing samples, but they are not generally a simple average of the samples on either side. Here’s the picture to have in mind (in 1D rather than 2D), where the orange lines show the original samples and the green dots are the new samples.

image

Downsampling is a little more complicated because the bandlimited requirement means the function needs to be smoother (because your samples are farther apart). That’s what an “anti-aliasing” filter does.

Rebinning the histogram

As you mention, downsampling by an integer factor is the easy case - you could just convolve your image with an ones(N, N) where N is your downsample factor:

using DSP

x = rand(32, 32)
x ./= sum(x) #normalize to 1
sum(x) # gives 1
w = ones(2,2)
y = conv(x,w)[2:2:32, 2:2:32] # do the sum and then pick out every other sample
sum(y) # gives 1

One way to upsample (by an integer factor) would just be to divide each bin into smaller ones and distribute the events evenly:

x = rand(16, 16)
x ./= sum(x) # normalize to 1
sum(x) # gives 1
y = zeros(32, 32)
# first copy into the larger array, leaving gaps for the new bins
y[1:2:32,1:2:32] .= x 
# now we redistribute the events to fill the gaps
w = ones(2,2) ./ 4
y = conv(y, w)[1:32, 1:32]
sum(y) # gives 1

Different interpolation schemes basically amount to changing the window w. For instance, using a window of ones(3,3) / 9 will do linear interpolation, but creates a problem on the bottom and right edges of the image, because those pixels don’t have pixels on both sides to average.

The next step from integer resampling is rational resampling, where your factor is A//B. In this case you can upsample by A and downsample by B (though that’s not the most efficient way). Irrational resampling is trickier, I’d definitely use a package for that.

2 Likes

Hi Tim, Thanks for the response. I got it.

One think I’m unsure about is how this interpolation happens when we are downsampling by greater than a factor of two. I am pretty sure that imresize uses the linear interpolation? In the case of (8x8) -> (4x4) I see that the average is preserved because we are simply interpolating the value at the center of 4 points (verified using Interpolations package linear interpolation). How is a point interpolated in the (8x8) -> (2x2) case? How do we interpolate the value of 1 point at the center of 16 points? Are only the 4 closest points used?

Hi @ssfrr, Thanks so much for the thoughtful response, this made a lot of sense to me. I’ll make use of your contributions.

One thing that strikes me here is that in general, image formats are not linear (e.g. sRGB), and if one resizes an image in such a format, to do it correctly it should be transformed into linear space before doing necessary averaging or kernel operations, then transformed back to the nonlinear space for storage.

Glancing at ImageTransformations briefly, I can’t find where that would occur, or how to tell it about the type of information in the image, so I don’t know what consideration was made for that, but as you look at implementations it is something to watch for (maybe for you to avoid accidentally getting that if your space is linear).

Understood! Thank you :slight_smile: