How does Flux.Conv work?

pr0physicist · October 22, 2021, 2:33pm

I was playing around with Flux and tried the Conv layer:

julia> using Flux

julia> l = Conv((1, 5), 1 => 1, Flux.sigmoid);

julia> inp = ones(1, 5, 1, 1);

julia> w = l.weight;

julia> b = l.bias;

julia> out1 = l(inp);
┌ Warning: Slow fallback ...

I compared the above result with “direct evaluation” below:

julia> out2 = Flux.sigmoid.(sum(w.*inp) .+ b);

julia> out1, out2
([0.433069055984647], [0.433069055984647])

The results are equal, so seems good! But then, I tried this:

julia> inp[1,5,1,1] = 0e0;

julia> out1 = l(inp);

julia> out2 = Flux.sigmoid.(sum(w.*inp) .+ b);

julia> out1, out2
([0.37701733747904254], [0.49181006345256256])

They don’t match! Am I doing something wrong with the direct calculation part or anything else?
Note: Tried the same thing (dotting with w and adding b) in PyTorch and the results match there.

albheim · October 22, 2021, 2:45pm

I think Flux might reverse the convolutional filter when applying it, tested you code but with out2 = Flux.sigmoid.(sum(reverse(w) .* inp) .+ b) which seemed to give the same output then.

ianfiske · October 22, 2021, 6:40pm

Flux uses NNLib for convolution. The kernel-flipping in NNLib is here: https://github.com/FluxML/NNlib.jl/blob/c30ea9bf9d024adfeb99bf10fb8a1e91368ca8ea/src/dim_helpers.jl#L138

This comes from the definition of convolution and is standard. For example, see theano - Why is the convolutional filter flipped in convolutional neural networks? - Stack Overflow

andrewdinhobl · October 22, 2021, 7:40pm

To add to what is said in the SO question ianfiske posted, for learned neural networks, it doesn’t make much of a difference - just whether you want an easier implementation or want to adhere more strictly to the definition of convolution (the alternative being the cross-correlation). IIRC, in backpropogation the kernel is flipped from whatever it was in the forward pass, and autodiff captures this nicely.

pr0physicist · October 23, 2021, 6:45am

Thanks for the responses!
@andrewdinhobl if I’m understanding this correctly, flipping/reversing the convolution kernel is simply a convention owing to signal processing history of convolutions and not essential for pure machine learning? (PyTorch and Tensorflow don’t seem to do this, being the most popular frameworks out there).
And practically speaking, if we want to use, say 2d convolution weights from Flux in the same model implemented using PyTorch, it’s sufficient to operate reverse() on 1,2 dims of the Conv.weight attribute, right?

ToucheSir · October 27, 2021, 4:40pm

PyTorch, TF and others are actually computing a cross-correlation because they don’t flip the kernel, see e.g. the docs page for Conv2d — PyTorch 1.12 documentation. Flux has a CrossCor layer that does the same, but because of row vs column major layouts I’m not sure which would be a direct equivalent if you loaded the weights over directly.

pr0physicist · October 31, 2021, 10:56am

@ToucheSir thanks for the reply. I actually attempted porting convolution weights from Flux to PyTorch, using PyCall. PyReverseDims function converts the column major Julia array to a PyTorch-compatible row major array and reverses the ordering of dims. Keeping that in mind, this operation passed all my tests for porting the Conv.weight attribute:

l = Conv((1, 5), 1 => 1, Flux.sigmoid)
w_julia = l.weight
w_torch = PyReverseDims(permutedims(reverse(w_julia, dims=(1,2)), (2, 1, 3, 4)))

and for the bias:

b_julia = l.bias
b_torch = PyReverseDims(b_julia)

sashmit · October 31, 2021, 1:46pm

Cross Correlating Neural Networks didn’t sound as good
Anyways, another take for more formal names for some of these operations:
https://math.stackexchange.com/questions/2203759/mathematical-name-for-the-flipped-matrix-and-the-subsequent-matrix-dot-product-i

Topic		Replies	Views
Translate a 1d convolution from Keras to Julia? New to Julia question , package , flux , python , machine-learning	3	792	October 21, 2022
Flux Conv layer method error New to Julia question , images , flux , convolution	4	650	January 16, 2023
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3065	December 18, 2019
Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP) Machine Learning question , package	6	1457	December 3, 2019
Transposed convolution aka deconvolution layer in Flux.jl Machine Learning flux	1	1709	September 6, 2019

How does Flux.Conv work?

Related topics