Blur filter using Parallel Accelerator




I’m trying to process an image by using Paralle Accelerator in the following way (it’s the documentation example,actually)

runStencil(buffer_B, Band_B, 1, oob_src_zero) do b, a
b[0,0] = blur(a)
return a, b

And this code runs in nearly 20 seconds on my machine with an 150MB input image.

But by using the blur filter from the package Images (imblur is the function name), I get the same result in something like 7 seconds.

Shouldnt the parallel blur be faster than the sequential function or am I doing it wrong?


I haven’t looked at that code, but I believe Tim Holy spent a lot of effort optimizing the memory access for many algorithms in Images so that they avoid cache misses (mimicking techniques used in, e.g., halide). For operations on large images, this is potentially more important than parallelization (although the algorithms in Images might also us parallelization as well–not sure).


Heck, I’m surprised it even takes 7 seconds. If you’re blurring, make sure you take advantage of the possibility of separability:

julia> img = rand(4000,4000);

julia> kf = KernelFactors.gaussian((1.5,1.5));    # represented as a separable filter

julia> @time imfilter(img, kf);
  0.370443 seconds (18.36 k allocations: 245.275 MB, 22.67% gc time)

julia> k = Kernel.gaussian((1.5,1.5));    # not represented as a separable filter

julia> @time imfilter(img, k);
  1.132960 seconds (18.40 k allocations: 245.284 MB, 11.39% gc time)

(all times measured after suitable warmup).

ImageFiltering also supports parallel computation, currently implemented only for separable filters. See


Hey. Thanks for the help (and sorry for the looooong time to answer)
It really explains a lot.