Best approach to smooth array? None of my approaches are very useful

Hi!

I´ve been trying this week to get something accomplished but I cannot get it done.

Problem: I have group of variables coming from a NetCDF. When imported using NCDatasets, I get arrays of 64 bits with a size (128,128,80,2400).
A code I was using (in matlab) has a line of the style movemean(moveman(movmean(array,window_i),window_j),window_l)
Which performs a smoothing in the 1st, 2nd, and 4th dimensions.
To do this in Julia, I have tried different approaches:

First approach: using Images.ImageFiltering with the kernel:

 function kernel4d(window_h,window_t)
    kern = ones(window_h,window_h,1,window_t)/(window_t*window_h*window_h);
 end

And passing it to imfilter like this:

function filter_array(array,smooth_x,smooth_time)
    filtered = similar(array)
        imfilter!(filtered,array, kernel4d(smooth_x,smooth_time))
    return filtered
end    

Second approach: using Images.ImageFiltering with mapwindow:

function filter_array2(array,smooth_x,smooth_time)
        filtered = mapwindow(mean,mapwindow(mean,mapwindow(mean,array, (smooth_x,1,1,1)),(1,smooth_x,1,1)),(1,1,1,smooth_time))
end 

Third approach: using Satistics.mean

function filter_array3(array,windowh,windowt)
    filtered = similar(array)
    fach = div((windowh + 1),2)
    fact = div((windowt + 1),2)
    xend,yend,zend,tend = size(array);
     for I in CartesianIndices(array)
       filtered[I] = mean(array[max(1,I[1]-fach):min(xend,I[1]+fach),I[2],I[3],I[4]]);
    end
     for I in CartesianIndices(array)
        filtered[I] = mean(filtered[[1],max(1,I[2]-fach):min(yend,I[2]+fach),I[3],I[4]]);
    end
     for I in CartesianIndices(array)
        filtered[I] = mean(filtered[I[1],I[2],I[3],max(1,I[4]-fact):min(tend,I[4]+fact)]);
    end
    return filtered
end

Although these approaches are not strictly equivalent (some take into account some padding in the borders), they are significantly slower than the matlab one line code (10x). My code has two major bottlenecks file reading (I don’t really know how to improve this) and this smoothing function. The window sizes are small in the first and second dimensions (around 5 elements) but for the fourth it is around 30 elements.

The first approach also uses impressive amounts of memory (it breaks with an out of memory error in a system with 50Gb).

The second approach has a problem: the second and the third loops are increasingly expensive due to accesing elements very far in memory.

I must say I call these process 8 times in a row for different variables. And I am using Julia 0.7.0

Is there something you could tell me about this?

Thanks

1 Like

Have you tried factored kernels: https://juliaimages.github.io/latest/imagefiltering.html#Factored-kernels-1? Also, potentially it could be worth moving the last dim to be the first as that would lead to better memory locality.

1 Like

Is approach 3 doing the right thing? You’ve got 3 for loops, each overwriting the previous one. Also in the second loop you have [1], instead of I[1], which might be doing something weird (not sure, don’t have Julia in front of me to test).

Thanks for noticing that. Indeed the result was wrong but fixing it didn’t help much. The thing about the three loops is what I want, as I want to filter the same array in three of its dimensions.

Hey mauro3, thanks for your answer:

I’ve tried Factored Kernels in this way, I am not so sure if this is the correct one, though.


function kernel4d_2(window_h,window_t)
    kern1 = centered(ones(window_h,1,1,1)/(window_h));
    kern2 = centered(ones(1,window_h,1,1)/(window_h));
    kern3 = centered(ones(1,1,1,window_t)/(window_t));
    kernz = centered(ones(1,1,1,1))

    return kernelfactors((kern1,kern2,kernz,kern3))
end

Is this the correct way to use it, and should I expect it to be better than using the 4-d Kernel directly?

If it does any decimation to the output afterwards, it would probably be worth trying to build that in to the filter. The x2400 by 30 dimension seems like a good candidate for a stride of 2 or more, but it depends on how you want to use the output

Add: there maybe something in BlasPack or the like that can help?

You can use it that way but this also works:

julia> function kernel4d(window_h, window_t)
           kernel_h = ones(window_h)/window_h
           kernel_t = ones(window_t)/window_t
           return kernelfactors((kernel_h, kernel_h, [1.0], kernel_t))
       end

(see ?kernelfactors and the part about being able to pass vectors). To explain what you’re seeing, try ?KernelFactors.ReshapedOneD and look at the individual elements of the return value.

and should I expect it to be better than using the 4-d Kernel directly?

That’s explained in the docs that @mauro3 linked:

If the kernel is of size m×n , then the upper version line requires mn operations for each point of filtered , whereas the lower version requires m+n operations. Especially when m and n are larger, this can result in a substantial savings.

In your case it’s the difference between 5 * 5 * 30 = 750 and 5 + 5 + 30 = 40. This therefore explains the factor of 10 difference with Matlab: notice that your Matlab code is using a factored kernel: window_i, window_j, and window_l are separate factors of the kernel. So you were asking Julia to use a fundamentally worse (more computationally demanding) approach than you were asking Matlab.

For even greater speed see https://juliaimages.github.io/latest/imagefiltering.html#Multithreading-1.

6 Likes

Thanks for the suggestion! I’m gonna try it making that if everything else fails. Initially all the data is kept.

Thanks a lot Tim!
The suggested new form of the call to kernel factors causes a surprisingly long error message. I will stay with the old one until I can figure out what is happening.
I let you the message in case is interesting/useful. I am using julia 0.7, would it be useful to start an issue? This only happens apparently when using “export JULIA_NUM_THREADS=N” where N is greater than 1

The error line message too long, so I pasted it in pastebin:

As I think you realized, that’s a Julia bug, not an ImageFiltering bug. For me it works on Julia 1.0.2 (and I can reproduce the error on Julia 0.7), so perhaps it might be worth trying on a newer version of Julia. If you can reproduce it on the latest release, it’s definitely worth reporting as a bug.

1 Like

I just tried Julia 1.0.2 and works flawlessly.
Thanks so much for all the help!

1 Like

Hi Tim:

Are this two things supposed to give the same output?

a = rand(10,10,10)
b = similar(a)
for i in 1:10
       b[:,:,i] = imfilter(a[:,:,i], kernelfactors(( ones(3)/3, ones(3)/3)))
end
c = imfilter(a,kernelfactors(( ones(3)/3 , ones(3)/3) , [1.0])) )

The question is,
shouldn't b==c?
It doesn’t, but I don’t understand why.

It’s a bug, I filed an issue for you: Bug in 3d cascade · Issue #92 · JuliaImages/ImageFiltering.jl · GitHub

Just updating this in case someone finds this post and uses it as a reference.

The correct form of smoothing in the three directions without displacing the array is:

function kernel4d(window_h, window_t)
           kernel_h = ones(window_h)/window_h
           kernel_t = ones(window_t)/window_t
           return kernelfactors((centered(kernel_h), centered(kernel_h), centered([1.0]), centered(kernel_t)))
end

The correct form of smoothing in the three directions without displacing the array is

It’s not just three directions, you want to use centered even in 1d or 2d if you are using a kernel that is symmetric around 0. The original form shifted the array along all 3 axes, not just the 3rd axis.

(The behavior above turned out not to be a bug, it was a feature.)

1 Like

Right, I was just trying to make it a final answer for the initial question of the post :smiley:
Thanks!

1 Like

I am coming back to this issue two year later:

Smoothing is a part of one calculation I do in the framework I am using for my PhD research.
The thing here is that the arrays I work with are of size (512,512,80,1200) of Float32s.

In one part of the process I smooth my data out and this is without question the part that takes the longest in the orders of hours while the rest of the process is done in less than one hour.

I am thinking an important part of this issue is the data non-locality. Especifically, to smooth in the time direction, julia needs to fetch 50 values that are 51251280 floats away from each other, and it needs to repeat this process for 51251280.

I know I can probably live with this and let it sink in the sands of time after my PhD, but I am obsessed with it so I come back every once in a while.

I ask you, dear Julia community. What would be the best approach to smooth a 51251280*1200 array using moving means of 20 points in the first dimension, 20 in the second and 50 in the fourth?

I know intelligent people must have come across this in some moment in time.

A “low-tech” but relatively fast and simple approach may be to consider explicit diffusion as smoothing mechanism. Let’s say you want to smooth a field A, then following would do the job in 2D (and can be easily extended to n-D, GPU parallel processing, …):

using Plots
@views function do_smoothing()
    nsmooth = 50
    A       = rand(100,100)
    CFL     = 0.5/2.1/length(size(A))
    for ismooth = 1:nsmooth
        A[2:end-1,2:end-1] .= A[2:end-1,2:end-1] + CFL*(diff(diff(A[:,2:end-1],dims=1),dims=1) + diff(diff(A[2:end-1,:],dims=2),dims=2))
        A[1,:] = A[2,:]; A[end,:] = A[end-1,:]; A[:,1] = A[:,2]; A[:,end] = A[:,end-1]
    end
    display(heatmap(A'))
end

@time do_smoothing()

Then you could as well decouple the diffusion in different dimensions and apply various number of smoothing steps individually for different dimensions.


EDIT: To put this in context with the moving mean filter, it would e.g. take about 10 diffusion smoothing steps nsmooth=10 to achieve a ~10-cell window smoothing:


modifying the above code as such

using Plots
@views function do_smoothing()
    nsmooth = 10
    A0      = 0.0*rand(100,100)
    A0[40:60,40:60] .= 1.0
    A       = copy(A0)
    CFL     = 0.5/2.1/length(size(A))
    for ismooth = 1:nsmooth
        A[2:end-1,2:end-1] .= A[2:end-1,2:end-1] + CFL*(diff(diff(A[:,2:end-1],dims=1),dims=1) + diff(diff(A[2:end-1,:],dims=2),dims=2))
        A[1,:] = A[2,:]; A[end,:] = A[end-1,:]; A[:,1] = A[:,2]; A[:,end] = A[:,end-1]
    end
    p=plot(A0[:,50],label="initial", linewidth=3); plot!(A[:,50], linewidth=3, markersize=5, markershape=:circle, label="smooth", framestyle=:box)
    display(p)
end

@time do_smoothing()
3 Likes

@aramirezreyes, have you looked at FastConv.jl?

The package seems to be broken but looking at the code and description they seem to implement the same idea of performing direct convolution with small kernels as in the code posted by @sijo here.

From your posts above it seems that you had tried something similar but if you could please confirm, it would be helpful.

1 Like