What's the status of image convolutions on CPU & GPU?

have you checked out Knet.jl?

By “pure convolution” I mean only forward convolution, i.e. applying a filter to an image, but not finding its gradient or applying pooling.

1 Like

Any time you’re thinking about multichannels, what you’re really doing is filtering “vector” objects over a spatial domain. But most languages don’t have nice ways of storing vector objects packed densely in an array, so they tend to use another array axis to encode the channel information. Even though it’s very standard, it’s unfortunately confusing and messy, because you’re now having to break what would otherwise be a very nice and natural abstraction (that the axes of the array correspond to “space” and that you’re filtering over space).

In Julia it’s easy to do this explicitly:

julia> a = rand(4, 5, 5);

julia> using StaticArrays

julia> b = reinterpret(SVector{4,Float64}, a, (5, 5))
5×5 Array{SVector{4,Float64},2}:
 [0.144815, 0.772607, 0.872914, 0.908799]    [0.633158, 0.0525735, 0.280742, 0.902825]  …  [0.896061, 0.407699, 0.380729, 0.0341831]  [0.755399, 0.0471598, 0.663326, 0.601342]
 [0.549754, 0.690097, 0.220682, 0.0802711]   [0.28825, 0.994905, 0.805212, 0.89021]        [0.743983, 0.966301, 0.18642, 0.702578]    [0.286288, 0.378818, 0.836149, 0.543906] 
 [0.657729, 0.508657, 0.881363, 0.785144]    [0.512295, 0.714254, 0.729959, 0.273817]      [0.169743, 0.525499, 0.949791, 0.376547]   [0.225779, 0.494444, 0.41264, 0.851775]  
 [0.00444584, 0.933053, 0.187804, 0.500937]  [0.355278, 0.154059, 0.011323, 0.514391]      [0.197709, 0.550803, 0.649878, 0.111545]   [0.871994, 0.446872, 0.859054, 0.238003] 
 [0.652901, 0.486696, 0.952878, 0.0807595]   [0.0998179, 0.281748, 0.477665, 0.74011]      [0.26828, 0.278152, 0.925838, 0.386637]    [0.400142, 0.0622027, 0.291799, 0.561965]

julia> using ImageFiltering

julia> kern = centered(rand(3,3))
OffsetArrays.OffsetArray{Float64,2,Array{Float64,2}} with indices -1:1×-1:1:
 0.994265  0.895043  0.577934
 0.381196  0.874816  0.395609
 0.519834  0.999507  0.371597

julia> imfilter(b, kern)
5×5 Array{SVector{4,Float64},2}:
 [2.01427, 3.89947, 3.65341, 4.19016]  [2.93492, 2.8359, 3.76376, 4.5498]    …  [4.63843, 2.33211, 3.43263, 2.63515]  [4.08414, 1.71187, 3.5868, 2.80771] 
 [2.63374, 3.78868, 4.01752, 3.98642]  [3.09131, 3.54901, 3.96213, 4.09826]     [3.7568, 2.84139, 3.55824, 3.13144]   [3.04872, 2.27554, 3.54844, 3.24217]
 [2.3728, 4.27511, 2.56762, 2.71285]   [2.79171, 3.80449, 2.87623, 3.18184]     [2.89213, 3.54058, 3.93211, 3.17953]  [2.81132, 3.24626, 3.81895, 3.10968]
 [2.71393, 3.45083, 3.95264, 2.87203]  [2.53135, 2.75678, 3.22067, 3.22934]     [2.23564, 2.67726, 3.78592, 2.74956]  [2.3726, 2.25835, 3.77262, 2.94542] 
 [2.10234, 3.41877, 3.3724, 2.03566]   [1.3769, 2.29489, 2.89553, 2.85193]      [1.948, 1.63945, 4.43804, 2.11742]    [2.77972, 1.62081, 3.51652, 2.2943] 
2 Likes

Sure, but there’s also ReverseDiff. In Julia you shouldn’t have to do anything “special”: the generic implementation for “plain convolution/correlation” should work as long as you define your kernel in terms of the right kinds of numbers. You can literally save tens of thousands of lines of code that way.

I’ve thought about this, but the output of deeper networks often has too many channels(e.g. a 32x32x2048 array), which means the vector object is of 2048 long, I guess it’s not a good idea to fit this into a StaticVector.

PS: For people who may not be familiar with deep learning networks, here is a visualization app called netscope in which you could check out the dimension of the input and output of each layer. Popular networks are listed at the end of the page.

Make sense. In that case construct your filter so that it has size 1 along axes that you don’t want to filter along:

julia> using ImageFiltering, OffsetArrays

julia> A = zeros(5, 7); A[:,1] = 1:5;

julia> kern = OffsetArray(rand(1,3), 0:0, -2:0)
OffsetArrays.OffsetArray{Float64,2,Array{Float64,2}} with indices 0:0×-2:0:
 0.0251132  0.155058  0.0603386

julia> imfilter(A, kern, Fill(0))
5×7 Array{Float64,2}:
 0.0603386  0.155058  0.0251132  0.0  0.0  0.0  0.0
 0.120677   0.310116  0.0502264  0.0  0.0  0.0  0.0
 0.181016   0.465174  0.0753396  0.0  0.0  0.0  0.0
 0.241354   0.620232  0.100453   0.0  0.0  0.0  0.0
 0.301693   0.77529   0.125566   0.0  0.0  0.0  0.0

This is exactly the point - ReverseDiff currently doesn’t support convolutions:

julia> using ReverseDiff
INFO: Precompiling module ReverseDiff.

julia> f1(x) = sum(2 .* x)
f1 (generic function with 1 method)

julia> ReverseDiff.gradient(f1, (rand(10,)))
10-element Array{Float64,1}:
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0
 2.0

julia> f2(x, w) = sum(conv2(x, w))
f2 (generic function with 1 method)

julia> ReverseDiff.gradient(f2, (rand(11, 11), rand(3,3)))
ERROR: MethodError: no method matching conv2(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}})
Stacktrace:
 [1] f2(::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}) at ./REPL[5]:1
 [2] ReverseDiff.GradientTape(::Function, ::Tuple{Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.GradientConfig{Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}}) at /home/dfdx/.julia/v0.6/ReverseDiff/src/api/tape.jl:207
 [3] gradient(::Function, ::Tuple{Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.GradientConfig{Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}}) at /home/dfdx/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22 (repeats 2 times)

To make it work, one needs to add definition of gradient of convolution to ReverseDiff. And to do so, one needs at least to have a function to compute the gradient given input parameters. If somebody wants to use ReverseDiff for deep learning, the same functions (convolution and its gradient) should also be defined for GPU (presumably, using method dispatching on array type). Summarize and you get the first post in this topic :slight_smile:

Seems to work for me:

julia> using ReverseDiff, ImageFiltering

julia> f2(x, w) = sum(imfilter(x, (w,)))
f2 (generic function with 1 method)

julia> g = ReverseDiff.gradient(f2, (rand(11, 11), rand(3,3)))
([0.0 0.0 … 0.0 0.0; 0.0 0.67748 … 1.73115 4.81457; … ; 0.0 2.58537 … 5.78043 16.352; 0.0 8.06819 … 17.9961 50.9701], [61.2169 61.5058 62.326; 61.794 61.3385 61.8287; 63.5663 62.6786 62.9682])

julia> g[1]
11×11 Array{Float64,2}:
 0.0  0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0       0.0    
 0.0  0.67748   1.43255   1.73115   1.73115   1.73115   1.73115   1.73115   1.73115   1.73115   4.81457
 0.0  1.59582   2.35881   3.3945    3.3945    3.3945    3.3945    3.3945    3.3945    3.3945    9.62338
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  2.58537   4.18432   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043   5.78043  16.352  
 0.0  8.06819  12.9459   17.9961   17.9961   17.9961   17.9961   17.9961   17.9961   17.9961   50.9701 

julia> g[2]
3×3 Array{Float64,2}:
 61.2169  61.5058  62.326 
 61.794   61.3385  61.8287
 63.5663  62.6786  62.9682

EDIT: might want to fix up ReverseDiff to handle arbitrary indices, though.

Could you elaborate more on this? Specifically, how to avoid calling imfiler repeatly in the loops below:

A = rand(32,32,128)
ker = rand(3,3,128,256)
B = zeros(32,32,256)
for kn = 1:256
    for cn = 1:128
        kern = OffsetArray(ker[:,:,cn,kn], -1:1, -1:1)
        B[:,:,kn] += imfilter(A[:,:,cn], kern, Fill(0))
    end
end

Oh, cool example. Sorry, I wasn’t careful enough in reading that link you posted earlier, now I understand better that it’s not multichannel but multi-filter.

But the same principles still apply. Since you’re using multiple filters I can’t get rid of the outer loop, but the cn loop is just a dot product. So one can do this:

using ImageFiltering, OffsetArrays

# Define a type that supports a*b but that this is meant in the sense of a dot product
struct Scalar{N,T}
    v::NTuple{N,T}
end
function Base.:*(a::Scalar{N}, b::Scalar{N}) where N
    ret = a.v[1]*b.v[1]
    for i = 2:N
        ret += a.v[i]*b.v[i]
    end
    ret
end
Base.@pure Base.zero(::Type{Scalar{N,T}}) where {N,T} =
    Scalar{N,T}((zeros(T,N)...))

# Here's one ugly tweak (safe_for_prod is about handling arithmetic overflow, e.g., 8-bit pixels)
ImageFiltering.safe_for_prod(x::Scalar{N,Float64}, ref) where N = x

A = reinterpret(Scalar{128,Float64}, rand(128,32,32), (32, 32))
kerA = reinterpret(Scalar{128,Float64}, rand(128,3,3,256), (3,3,256))
ker = OffsetArray(kerA, -1:1, -1:1, 1:256)
B = zeros(32,32,256)
tmp = similar(B, 32, 32)
for kn = 1:256
    imfilter!(view(B, :, :, kn), A, (view(ker, :, :, kn),), Fill(zero(eltype(A))))
end

I would be fine with adding Scalar to ImageFiltering or elsewhere.

I had to make a one-character change to ImageFiltering to fix a type bug in padding (I’ll submit it in the next day or so), but that was it.

2 Likes

CUDNN is actually already fixed to work with CUDAdrv on the latest release. If you want to try hooking up convolutions or anything else, you’d be more than welcome. See softmax for a working example. More refactoring will happen but it doesn’t have to block this.

Yeah, using ReverseDiff with imfilter as-is like this is definitely not a good idea. It will work numerically but only by differentiating through tens of millions of scalar operations; even a moderately sized image (500x500) takes minutes for me. If you compile the tape it will do better, but still be handily beaten by a gradient kernel.

All this is somewhat moot given that AD can’t differentiate GPU kernels or libraries calls; convolutions will certainly need to use nvidia’s library and will therefore need an explicit gradient.

We could rename the package ArrayFiltering or ArrayConvolution if you think that’s better.

Please no! The field “image processing” handles d-dimensional signals. So the name Image has a general meaning here. It also structures various packages quite nicely:

http://juliaimages.github.io/latest/

Please do not rename one individual package when it really belongs to the “Image Toolbox” that Images.jl provides (as a metapackage)

But wait, is you that created that package family :-)))

@MikeInnes, thanks for chiming in. FWIW for a 500x500 image I get something that’s a lot better, if I compile and ask only for the gradient with respect to the kernel:

 julia> @time ReverseDiff.gradient!(res, compiled_f_tape, kern)
  0.468191 seconds (4 allocations: 160 bytes)
3×3 Array{Float64,2}:
      1.24595e5       1.24343e5  1.24098e5
 124347.0        124095.0        1.23851e5
 124089.0             1.23838e5  1.23594e5

But you’re absolutely right, that does still seem slow given that the raw filtering operation is only a few ms and there are only 9 components. So, there does seem to be room for a hand-computed gradient. Fortunately for convolution that’s easy, though for stuff that comes after it could get a little hairier.

Sweet! I will definitely do.

1 Like

Hello,

Is Intel IPP integrated as well?

Since Intel MKL is integrated into Julia the next step would be integrating Intel IPP.

FYI, there is a fast ND convolution package:
https://github.com/aamini/FastConv.jl

1 Like

there is also https://github.com/uschmidt83/ImageProcessing.jl which has a julia im2col implementation

FastConv is nice if you want something that grows the array by the kernel size (like conv2, something I’ve never gotten around to implementing in ImageFiltering). Otherwise, though, imfilter is both faster and more general:

A = rand(1000,1000);
k = rand(3,3);
julia> @btime convn(A, k);
  15.451 ms (20 allocations: 7.66 MiB)

julia> @btime imfilter(A, (k,));
  12.473 ms (44 allocations: 15.32 MiB)

julia> B = view(A, 1:900, 1:900);

julia> @btime imfilter(B, (k,));
  10.139 ms (44 allocations: 12.42 MiB)

julia> @btime convn(B, (k,));
ERROR: MethodError: no method matching convn(::SubArray{Float64,2,Array{Float64,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::Tuple{Array{Float64,2}})

Just a quick note: I will put together a FFT-based implementation as soon as I find the time; I haven’t forgotten this topic. Also, it seems like it’s still not clear which package will ultimately collect all these implementations?