How does Flux.Conv work?

I was playing around with Flux and tried the Conv layer:

julia> using Flux

julia> l = Conv((1, 5), 1 => 1, Flux.sigmoid);

julia> inp = ones(1, 5, 1, 1);

julia> w = l.weight;

julia> b = l.bias;

julia> out1 = l(inp);
┌ Warning: Slow fallback ...

I compared the above result with “direct evaluation” below:

julia> out2 = Flux.sigmoid.(sum(w.*inp) .+ b);

julia> out1, out2
([0.433069055984647], [0.433069055984647])

The results are equal, so seems good! But then, I tried this:

julia> inp[1,5,1,1] = 0e0;

julia> out1 = l(inp);

julia> out2 = Flux.sigmoid.(sum(w.*inp) .+ b);

julia> out1, out2
([0.37701733747904254], [0.49181006345256256])

They don’t match! Am I doing something wrong with the direct calculation part or anything else?
Note: Tried the same thing (dotting with w and adding b) in PyTorch and the results match there.

2 Likes

I think Flux might reverse the convolutional filter when applying it, tested you code but with out2 = Flux.sigmoid.(sum(reverse(w) .* inp) .+ b) which seemed to give the same output then.

Flux uses NNLib for convolution. The kernel-flipping in NNLib is here: https://github.com/FluxML/NNlib.jl/blob/c30ea9bf9d024adfeb99bf10fb8a1e91368ca8ea/src/dim_helpers.jl#L138

This comes from the definition of convolution and is standard. For example, see theano - Why is the convolutional filter flipped in convolutional neural networks? - Stack Overflow

1 Like

To add to what is said in the SO question ianfiske posted, for learned neural networks, it doesn’t make much of a difference - just whether you want an easier implementation or want to adhere more strictly to the definition of convolution (the alternative being the cross-correlation). IIRC, in backpropogation the kernel is flipped from whatever it was in the forward pass, and autodiff captures this nicely.

Thanks for the responses!
@andrewdinhobl if I’m understanding this correctly, flipping/reversing the convolution kernel is simply a convention owing to signal processing history of convolutions and not essential for pure machine learning? (PyTorch and Tensorflow don’t seem to do this, being the most popular frameworks out there).
And practically speaking, if we want to use, say 2d convolution weights from Flux in the same model implemented using PyTorch, it’s sufficient to operate reverse() on 1,2 dims of the Conv.weight attribute, right?

PyTorch, TF and others are actually computing a cross-correlation because they don’t flip the kernel, see e.g. the docs page for Conv2d — PyTorch 1.12 documentation. Flux has a CrossCor layer that does the same, but because of row vs column major layouts I’m not sure which would be a direct equivalent if you loaded the weights over directly.

2 Likes

@ToucheSir thanks for the reply. I actually attempted porting convolution weights from Flux to PyTorch, using PyCall. PyReverseDims function converts the column major Julia array to a PyTorch-compatible row major array and reverses the ordering of dims. Keeping that in mind, this operation passed all my tests for porting the Conv.weight attribute:

l = Conv((1, 5), 1 => 1, Flux.sigmoid)
w_julia = l.weight
w_torch = PyReverseDims(permutedims(reverse(w_julia, dims=(1,2)), (2, 1, 3, 4)))

and for the bias:

b_julia = l.bias
b_torch = PyReverseDims(b_julia)
1 Like

Cross Correlating Neural Networks didn’t sound as good :slight_smile:
Anyways, another take for more formal names for some of these operations:
https://math.stackexchange.com/questions/2203759/mathematical-name-for-the-flipped-matrix-and-the-subsequent-matrix-dot-product-i