What's the status of image convolutions on CPU & GPU?

Gnimuc · September 27, 2017, 11:43am

have you checked out Knet.jl?

dfdx · September 27, 2017, 11:43am

By “pure convolution” I mean only forward convolution, i.e. applying a filter to an image, but not finding its gradient or applying pooling.

tim.holy · September 27, 2017, 11:46am

Any time you’re thinking about multichannels, what you’re really doing is filtering “vector” objects over a spatial domain. But most languages don’t have nice ways of storing vector objects packed densely in an array, so they tend to use another array axis to encode the channel information. Even though it’s very standard, it’s unfortunately confusing and messy, because you’re now having to break what would otherwise be a very nice and natural abstraction (that the axes of the array correspond to “space” and that you’re filtering over space).

In Julia it’s easy to do this explicitly:

julia> a = rand(4, 5, 5);

julia> using StaticArrays

julia> b = reinterpret(SVector{4,Float64}, a, (5, 5))
5×5 Array{SVector{4,Float64},2}:
 [0.144815, 0.772607, 0.872914, 0.908799]    [0.633158, 0.0525735, 0.280742, 0.902825]  …  [0.896061, 0.407699, 0.380729, 0.0341831]  [0.755399, 0.0471598, 0.663326, 0.601342]
 [0.549754, 0.690097, 0.220682, 0.0802711]   [0.28825, 0.994905, 0.805212, 0.89021]        [0.743983, 0.966301, 0.18642, 0.702578]    [0.286288, 0.378818, 0.836149, 0.543906] 
 [0.657729, 0.508657, 0.881363, 0.785144]    [0.512295, 0.714254, 0.729959, 0.273817]      [0.169743, 0.525499, 0.949791, 0.376547]   [0.225779, 0.494444, 0.41264, 0.851775]  
 [0.00444584, 0.933053, 0.187804, 0.500937]  [0.355278, 0.154059, 0.011323, 0.514391]      [0.197709, 0.550803, 0.649878, 0.111545]   [0.871994, 0.446872, 0.859054, 0.238003] 
 [0.652901, 0.486696, 0.952878, 0.0807595]   [0.0998179, 0.281748, 0.477665, 0.74011]      [0.26828, 0.278152, 0.925838, 0.386637]    [0.400142, 0.0622027, 0.291799, 0.561965]

julia> using ImageFiltering

julia> kern = centered(rand(3,3))
OffsetArrays.OffsetArray{Float64,2,Array{Float64,2}} with indices -1:1×-1:1:
 0.994265  0.895043  0.577934
 0.381196  0.874816  0.395609
 0.519834  0.999507  0.371597

julia> imfilter(b, kern)
5×5 Array{SVector{4,Float64},2}:
 [2.01427, 3.89947, 3.65341, 4.19016]  [2.93492, 2.8359, 3.76376, 4.5498]    …  [4.63843, 2.33211, 3.43263, 2.63515]  [4.08414, 1.71187, 3.5868, 2.80771] 
 [2.63374, 3.78868, 4.01752, 3.98642]  [3.09131, 3.54901, 3.96213, 4.09826]     [3.7568, 2.84139, 3.55824, 3.13144]   [3.04872, 2.27554, 3.54844, 3.24217]
 [2.3728, 4.27511, 2.56762, 2.71285]   [2.79171, 3.80449, 2.87623, 3.18184]     [2.89213, 3.54058, 3.93211, 3.17953]  [2.81132, 3.24626, 3.81895, 3.10968]
 [2.71393, 3.45083, 3.95264, 2.87203]  [2.53135, 2.75678, 3.22067, 3.22934]     [2.23564, 2.67726, 3.78592, 2.74956]  [2.3726, 2.25835, 3.77262, 2.94542] 
 [2.10234, 3.41877, 3.3724, 2.03566]   [1.3769, 2.29489, 2.89553, 2.85193]      [1.948, 1.63945, 4.43804, 2.11742]    [2.77972, 1.62081, 3.51652, 2.2943]

tim.holy · September 27, 2017, 11:48am

Sure, but there’s also ReverseDiff. In Julia you shouldn’t have to do anything “special”: the generic implementation for “plain convolution/correlation” should work as long as you define your kernel in terms of the right kinds of numbers. You can literally save tens of thousands of lines of code that way.

Gnimuc · September 27, 2017, 11:58am

I’ve thought about this, but the output of deeper networks often has too many channels(e.g. a 32x32x2048 array), which means the vector object is of 2048 long, I guess it’s not a good idea to fit this into a StaticVector.

PS: For people who may not be familiar with deep learning networks, here is a visualization app called netscope in which you could check out the dimension of the input and output of each layer. Popular networks are listed at the end of the page.

tim.holy · September 27, 2017, 12:07pm

Make sense. In that case construct your filter so that it has size 1 along axes that you don’t want to filter along:

julia> using ImageFiltering, OffsetArrays

julia> A = zeros(5, 7); A[:,1] = 1:5;

julia> kern = OffsetArray(rand(1,3), 0:0, -2:0)
OffsetArrays.OffsetArray{Float64,2,Array{Float64,2}} with indices 0:0×-2:0:
 0.0251132  0.155058  0.0603386

julia> imfilter(A, kern, Fill(0))
5×7 Array{Float64,2}:
 0.0603386  0.155058  0.0251132  0.0  0.0  0.0  0.0
 0.120677   0.310116  0.0502264  0.0  0.0  0.0  0.0
 0.181016   0.465174  0.0753396  0.0  0.0  0.0  0.0
 0.241354   0.620232  0.100453   0.0  0.0  0.0  0.0
 0.301693   0.77529   0.125566   0.0  0.0  0.0  0.0

dfdx · September 27, 2017, 12:48pm

This is exactly the point - ReverseDiff currently doesn’t support convolutions:

julia> using ReverseDiff
INFO: Precompiling module ReverseDiff.

julia> f1(x) = sum(2 .* x)
f1 (generic function with 1 method)

julia> ReverseDiff.gradient(f1, (rand(10,)))
10-element Array{Float64,1}:
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0

julia> f2(x, w) = sum(conv2(x, w))
f2 (generic function with 1 method)

julia> ReverseDiff.gradient(f2, (rand(11, 11), rand(3,3)))
ERROR: MethodError: no method matching conv2(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}})
Stacktrace:
 [1] f2(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}) at ./REPL[5]:1
 [2] ReverseDiff.GradientTape(::Function, ::Tuple{Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.GradientConfig{Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}}) at /home/dfdx/.julia/v0.6/ReverseDiff/src/api/tape.jl:207
 [3] gradient(::Function, ::Tuple{Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.GradientConfig{Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}}) at /home/dfdx/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22 (repeats 2 times)

To make it work, one needs to add definition of gradient of convolution to ReverseDiff. And to do so, one needs at least to have a function to compute the gradient given input parameters. If somebody wants to use ReverseDiff for deep learning, the same functions (convolution and its gradient) should also be defined for GPU (presumably, using method dispatching on array type). Summarize and you get the first post in this topic

tim.holy · September 27, 2017, 1:02pm

Seems to work for me:

julia> using ReverseDiff, ImageFiltering

julia> f2(x, w) = sum(imfilter(x, (w,)))
f2 (generic function with 1 method)

julia> g = ReverseDiff.gradient(f2, (rand(11, 11), rand(3,3)))
([0.0 0.0 … 0.0 0.0; 0.0 0.67748 … 1.73115 4.81457; … ; 0.0 2.58537 … 5.78043 16.352; 0.0 8.06819 … 17.9961 50.9701], [61.2169 61.5058 62.326; 61.794 61.3385 61.8287; 63.5663 62.6786 62.9682])

julia> g[1]
11×11 Array{Float64,2}:
 0.0  0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0    
 0.0  0.67748   1.43255   1.73115   1.73115   1.73115   1.73115   1.73115   1.73115   1.73115   4.81457
 0.0  1.59582   2.35881   3.3945    3.3945    3.3945    3.3945    3.3945    3.3945    3.3945    9.62338
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  8.06819  12.9459   17.9961   17.9961   17.9961   17.9961   17.9961   17.9961   17.9961   50.9701 

julia> g[2]
3×3 Array{Float64,2}:
 61.2169  61.5058  62.326 
 61.794   61.3385  61.8287
 63.5663  62.6786  62.9682

EDIT: might want to fix up ReverseDiff to handle arbitrary indices, though.

Gnimuc · September 27, 2017, 1:22pm

Could you elaborate more on this? Specifically, how to avoid calling imfiler repeatly in the loops below:

A = rand(32,32,128)
ker = rand(3,3,128,256)
B = zeros(32,32,256)
for kn = 1:256
    for cn = 1:128
        kern = OffsetArray(ker[:,:,cn,kn], -1:1, -1:1)
        B[:,:,kn] += imfilter(A[:,:,cn], kern, Fill(0))
    end
end

tim.holy · September 27, 2017, 4:12pm

Oh, cool example. Sorry, I wasn’t careful enough in reading that link you posted earlier, now I understand better that it’s not multichannel but multi-filter.

But the same principles still apply. Since you’re using multiple filters I can’t get rid of the outer loop, but the cn loop is just a dot product. So one can do this:

using ImageFiltering, OffsetArrays

# Define a type that supports a*b but that this is meant in the sense of a dot product
struct Scalar{N,T}
    v::NTuple{N,T}
end
function Base.:*(a::Scalar{N}, b::Scalar{N}) where N
    ret = a.v[1]*b.v[1]
    for i = 2:N
        ret += a.v[i]*b.v[i]
    end
    ret
end
Base.@pure Base.zero(::Type{Scalar{N,T}}) where {N,T} =
    Scalar{N,T}((zeros(T,N)...))

# Here's one ugly tweak (safe_for_prod is about handling arithmetic overflow, e.g., 8-bit pixels)
ImageFiltering.safe_for_prod(x::Scalar{N,Float64}, ref) where N = x

A = reinterpret(Scalar{128,Float64}, rand(128,32,32), (32, 32))
kerA = reinterpret(Scalar{128,Float64}, rand(128,3,3,256), (3,3,256))
ker = OffsetArray(kerA, -1:1, -1:1, 1:256)
B = zeros(32,32,256)
tmp = similar(B, 32, 32)
for kn = 1:256
    imfilter!(view(B, :, :, kn), A, (view(ker, :, :, kn),), Fill(zero(eltype(A))))
end

I would be fine with adding Scalar to ImageFiltering or elsewhere.

I had to make a one-character change to ImageFiltering to fix a type bug in padding (I’ll submit it in the next day or so), but that was it.

MikeInnes · September 27, 2017, 6:57pm

CUDNN is actually already fixed to work with CUDAdrv on the latest release. If you want to try hooking up convolutions or anything else, you’d be more than welcome. See softmax for a working example. More refactoring will happen but it doesn’t have to block this.

MikeInnes · September 27, 2017, 7:12pm

Yeah, using ReverseDiff with imfilter as-is like this is definitely not a good idea. It will work numerically but only by differentiating through tens of millions of scalar operations; even a moderately sized image (500x500) takes minutes for me. If you compile the tape it will do better, but still be handily beaten by a gradient kernel.

All this is somewhat moot given that AD can’t differentiate GPU kernels or libraries calls; convolutions will certainly need to use nvidia’s library and will therefore need an explicit gradient.

tobias.knopp · September 27, 2017, 8:31pm

We could rename the package ArrayFiltering or ArrayConvolution if you think that’s better.

Please no! The field “image processing” handles d-dimensional signals. So the name Image has a general meaning here. It also structures various packages quite nicely:

http://juliaimages.github.io/latest/

Please do not rename one individual package when it really belongs to the “Image Toolbox” that Images.jl provides (as a metapackage)

But wait, is you that created that package family :-)))

tim.holy · September 27, 2017, 9:38pm

@MikeInnes, thanks for chiming in. FWIW for a 500x500 image I get something that’s a lot better, if I compile and ask only for the gradient with respect to the kernel:

 julia> @time ReverseDiff.gradient!(res, compiled_f_tape, kern)
  0.468191 seconds (4 allocations: 160 bytes)
3×3 Array{Float64,2}:
      1.24595e5       1.24343e5  1.24098e5
 124347.0        124095.0        1.23851e5
 124089.0             1.23838e5  1.23594e5

But you’re absolutely right, that does still seem slow given that the raw filtering operation is only a few ms and there are only 9 components. So, there does seem to be room for a hand-computed gradient. Fortunately for convolution that’s easy, though for stuff that comes after it could get a little hairier.

dfdx · September 27, 2017, 10:23pm

Sweet! I will definitely do.

RoyiAvital · September 30, 2017, 1:38pm

Hello,

Is Intel IPP integrated as well?

Since Intel MKL is integrated into Julia the next step would be integrating Intel IPP.

jwu · October 4, 2017, 9:43am

FYI, there is a fast ND convolution package:
https://github.com/aamini/FastConv.jl

Evizero · October 4, 2017, 8:38pm

there is also https://github.com/uschmidt83/ImageProcessing.jl which has a julia im2col implementation

tim.holy · October 7, 2017, 11:56am

FastConv is nice if you want something that grows the array by the kernel size (like conv2, something I’ve never gotten around to implementing in ImageFiltering). Otherwise, though, imfilter is both faster and more general:

A = rand(1000,1000);
k = rand(3,3);
julia> @btime convn(A, k);
  15.451 ms (20 allocations: 7.66 MiB)

julia> @btime imfilter(A, (k,));
  12.473 ms (44 allocations: 15.32 MiB)

julia> B = view(A, 1:900, 1:900);

julia> @btime imfilter(B, (k,));
  10.139 ms (44 allocations: 12.42 MiB)

julia> @btime convn(B, (k,));
ERROR: MethodError: no method matching convn(::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::Tuple{Array{Float64,2}})

josuagrw · October 10, 2017, 10:20pm

Just a quick note: I will put together a FFT-based implementation as soon as I find the time; I haven’t forgotten this topic. Also, it seems like it’s still not clear which package will ultimately collect all these implementations?

Topic		Replies	Views
CartesianIndices, Convolution and other "moving window" computations on GPU New to Julia gpu , indexing	0	629	January 10, 2019
Which GPU should I ask for? GPU	4	707	July 29, 2020
GPU performance issues with an ML-from-scratch tutorial GPU	7	446	April 17, 2023
Computationally efficient image filtering Performance package , dsp , methods	3	337	May 16, 2023
Yay, I finally managed to define my custom kernel. But now I need it to be fast(er) GPU optimization	6	2131	March 2, 2020

What's the status of image convolutions on CPU & GPU?

Related topics