What's the status of image convolutions on CPU & GPU?


#1

Most deep learning libraries rely on the same set of primitive functions such as matrix multiplication, element-wise operations, activation functions etc., and convolutions. Also, most of the code should run either on GPU (preferably) or CPU. Matrix multiplication and element-wise functions are already pretty well supported on both - core Julia Arrays and CL/Cu/GPUArrays. But what about image convolutions?

For GPU the best option so far seems to be CUDNN.jl. However, it doesn’t look very popular: it depends on CUDArt.jl which is “phased out”, Knet.jl calls cuDNN directly and Flux.jl doesn’t mention convolution at all. (There’s also MXNet.jl, but I believe it just wraps C code).

For CPU the only convolution implementation I found is a built-in conv2 function which is not quite the same thing.

Is there anybody working on something like this? Are there any discussions I’ve missed?

Currently I have some time to work on it, but I guess I’m not the only / the first interested in the topic, so it would be great to synchronize vision before starting.


#2

As far as I know there is no standard package yet. I usually implement my own versions of FFT-based convolutions whenever I need it.

I think a big problem with general implementations are boundary conditions. A potential package should imho include at least periodic and constant BCs but there are probably more options out depending on the application.

Naturally, I would be interested in contributing to such a package.


#3

ImageFiltering has some fast CPU convolutions, at least for separable filters (where it supports cache-efficient tiling & multithreading very much like Halide). The dense filters could certainly be improved with tiling & threading. ImageFiltering also has FFT convolutions—you can choose FIR, FFT, or let it automatically decide for you based on the size of the kernel. It also has some fancy options like separable gaussian-approximation IIR filters (blazingly fast even for very large Gaussians). Finally, there are many different boundary conditions supported.

No GPU filtering yet, but a key part about ImageFiltering’s design is that it lets you specify different ComputationalResources, with GPUs as one use case. So the architecture is there to support many different options, it just needs the specific implementation.


#4

You have ArrayFire too which provide some functions on GPU


#5

We’re planning on folding CUDNN etc. into CuArrays, making that array type the go-to option for array computing on CUDA GPUs (using CUDAdrv instead of CUDArt). Similarly for CLArrays, all built on top of the GPUArrays interface. Should all be Knet compatible.

cc @sdanisch @MikeInnes


#6

I started putting some very simple convolution kernels in the new GPUArrays:

@josuagrw it would be absolutely wonderful if you can add your fft implementation :slight_smile:

@tim.holy I must say I’m still not a huge fan of ComputationalResources - They make sense for multi threading, but for GPUs I much rather dispatch on the Array type and only use the gpu if the data is already on the GPU.
After all, the memory transfer cost can easily be more expensive than the filtering operation.

Now (basically since 2 days ago) that GPUArrays is a pure julia interface without any dependencies, it becomes also much easier to depend on it in other packages and define functions on the abstract GPUArray type!


#7

Sounds like GPUArrays is ready to be woven into ImageFiltering, and a PR would be gratefully accepted. As you say, we still need something like ComputationalResources for multithreading, so the two are not in conflict.


#8

Great! :slight_smile: I started working on the developer docs on how to write hardware Independent gpu code:
https://juliagpu.github.io/GPUArrays.jl/latest/

I will think about how we can structure this - relying on ImageFiltering to get a generic convolution function seems a bit odd. Maybe we can put a basic implementation into GPUArrays and use that for an image centered version in ImageFiltering!
I’d love to reuse all the great infrastructure in ImageFiltering for padding etc!


#9

What do you think ImageFiltering is, if not a package for doing convolutions? That’s basically the only thing it does. It’s just that it (1) works in arbitrary dimension, (2) provides many different approaches (FIR, IIR, FFT) and boundary conditions, (3) pre-defines a whole bunch of kernels, and (4) is careful about things like arithmetic overflow since many images use 8-bit types. I’d be a little bit surprised if you didn’t find yourself slowly wanting to think about such things in a GPU implementation.


#10

Please note that though pure convolution is enough for general-purpose image processing, for deep learning we also need convolution gradients as well as pooling. If CuArrays.jl wraps cuDNN, we will get all of these on GPU, but as far as I understand we still don’t have anything for gradients and pooling for CPU.


#11

Oh no doubt, it’s a fabulous package and I want to get the GPU implementations in there with the same careful design! :wink:
The image in the name just comes across a bit specific if I want to signal that you can do generic signal processing with GPUArrays!


#12

Not sure I know exactly what you need, but for gradients it’s possible that all you have to do is define your kernel with ForwardDiff/ReverseDiff numbers and it may Just Work.


#13

We could rename the package ArrayFiltering or ArrayConvolution if you think that’s better.


#14

Would be great, if that’s the actual scope of the package!


#15

Yeah, that’s really all it is: an image is just an array, with perhaps a bit of added smarts about handling ColorTypes/FixedPointNumber objects. Much of JuliaImages is just infrastructure and algorithms for handling multidimensional arrays.

As far as the specific name, “filtering” is a little bit more general than “convolution” since ImageFiltering also supports nonlinear “kernels.” Though I worry that in the modern world everyone will look for convolution.


#16

As long as the readme/description mentions the important keywords, people that use google will find it either way.


#17

what do you mean by “pure convolution”? the popular “convolution” method used in deep learning world is actually “cross-correlation”. did you mean that?


#18

@dfdx, seems promising:

 julia> using ImageFiltering, ForwardDiff
INFO: Recompiling stale cache file /home/tim/.julia/lib/v0.6/ForwardDiff.ji for module ForwardDiff.

julia> a = rand(9, 10);

julia> kern = rand(3,3);

julia> f(kern) = imfilter(a, (kern,))
f (generic function with 1 method)

julia> J(kern) = ForwardDiff.jacobian(f, kern)
J (generic function with 1 method)

julia> J(kern)
# 90x9 output array (should be reshaped) suppressed

Checking whether the answers are correct is left as an exercise for the reader :smile:. The most likely issue might be a shift due to the fact that ImageFiltering takes the array indices literally, and ForwardDiff doesn’t yet appear to be compliant.


#19

I’m wondering how to implement multiple filters using ImageFiltering. Here is a toy example implemented with naive for-loops:

It’s too slow to use this in real application, the right way to implement this is using im2col trick which can benefit a lot from BLAS acceleration.


#20

Actually it’s a bit more difficult :slight_smile: In convolutional NNs you seek for a derivative of a scalar output w.r.t. multiple input parameters (e.g. weight matrices/filters). Forward-mode automatic differentiation - a method used by ForwardDiff.jl - has bad complexity for such derivatives. Instead, reverse-mode AD is normally used.

Reverse-mode AD utilizes derivatives of known primitive operations such as log, exp, *, +, etc. and chain rule to combine them and find derivatives of more complex functions. Convolution is one of such primitive operations, so the situation is exactly the opposite - first you need to define derivative of convolution and then incorporate it into a differentiation package.

cuDNN provides both - implementation of convolution (forward operation) and its derivative (reverse operation). I think I’ve also seen a similar kernel for OpenCL. But I haven’t seen anything for CPU yet. It shouldn’t be a problem, though, since gradient of convolution is a kind of convolution itself (with image and filter swapped, IIRC). So we just need to decide how to organize packages.