Flux. Pooling followed by Dense

I am at a loss about the following. I am building a small 2D network where the last 2 layers are a MaxPool followed by a Dense. The parameters to create MaxPool do not care about the size of the input. But the Dense just afterwards do.

I have tried many ways to extract this work but without success. (A Flux.flatten is in between) . Hoping for guidance elsewhere, I have gone through the source code of Flux and NNLib. Every single example hard-codes the input size of Dense or cheats with keeping the stride of the MaxPool at 1.

I am sure I am miising something stupid.

What exactly is the question?

How to append a Dense layer after a Pooling layer when I don’ t know what the size coming out of MaxPool is and Dense requires to know it.

You can do something like this if you don’t want to calculate it manually. Otherwise it shouldn’t be too hard to figure out an expression for the output shape based on the stride and padding of the maxpool in combination with the size of the output from the convolution (which might also need some calculation).

julia> xs = rand(Float32, 100, 100, 3, 50);

julia> layer1 = Conv((5,5), 3 => 7, relu; bias = false)
Conv((5, 5), 3 => 7, relu, bias=false)  # 525 parameters

julia> layer2 = MaxPool((5, 5), pad=SamePad())
MaxPool((5, 5), pad=2)

julia> layer3 = flatten
flatten (generic function with 1 method)

julia> tmp = Chain(layer1, layer2, layer3)
  Conv((5, 5), 3 => 7, relu, bias=false),  # 525 parameters
  MaxPool((5, 5), pad=2),

julia> size(tmp(xs))
(2800, 50)

julia> layer4 = Dense(size(tmp(xs), 1), 3)
Dense(2800, 3)      # 8_403 parameters

julia> model = Chain(layer1, layer2, layer3, layer4)
  Conv((5, 5), 3 => 7, relu, bias=false),  # 525 parameters
  MaxPool((5, 5), pad=2),
  Dense(2800, 3),                       # 8_403 parameters
)                   # Total: 3 arrays, 8_928 parameters, 35.414 KiB.

Thanks for putting the time in this answer.

I have tried all of that.
I did define a generic struct where you stop in the constructor to extract the size… Dosn’t work because no information about the input at that point of the definition.
I then did this within a function being the “forwarding” function. It creates a model fine. But applying to an actual input always bombs out. (Including Zygote trying to differentiate the size.)
The forwarding function is where it has to happen. But this is where I cannot find a way to extract the size of the output from MaxPool.

I have a helper function to generate size and iterators of convolutions. It works everywhere in the code, but there. Never gives me a consistent correct result.

I might end up creating a replicate of MaxPool or getting rid of it. But this is not really a satisfying answer.

P.S.: Don’t get me started of Zygote complaining about mutating arrays. That has been biting on and on.

Hmm, I’m not sure I understand.

But what do you know about your input? You know it will be RGB images of the same dimensions?

So you create a function that does the forward pass, and halfway through you try to check what the current size of the data is and generate a dense layer corresponding to that size? That seems like it would be problematic in many ways.

This is what I was thinking would be the nicer solution. What is the problem, can you share the function and a case where it gives the wrong answer?

@Emmanuel-R8 maybe you are looking for Flux.outputsize? It is still not clear to me what you are trying to achieve, a code example would help


Let me flip the question around and ask: how would you do this with Python libraries? Because if you can articulate that, then the Flux solution will be a pretty direct translation.

For example, with TF/PyTorch your options are:

  1. Fix the input size and derive the pre-dense output size from that. If you want help from something like tf.Keras’ shape inference, check out Flux.outputsize as @CarloLucibello mentioned.
  2. Use adaptive or global pooling. Unlike normal pooling layers, these generate a fixed size output for variable-sized inputs. Global pooling in particular is bread-and-butter for most vision models, but Flux has layers for both types.

Thanks. I went with the adaptative layer option.

I uploaded the code to https://github.com/Alba-Intelligence/SharpenedCorrelation.

The idea is to implement a variant of the Sharpened Cosine Similarity. See https://www.rpisoni.dev/posts/cossim-convolution/ for a description, and https://e2eml.school/scs.html for various implementations.

Now, I reached the dreaded Zygote complaints about mutating arrays.

As always, this is where at bare minimum a full stacktrace would be necessary, if not a MWE as well.

Of course. I wrote that with a lot of negative feelings dreading the incoming hair pulling.

(any questions about it would warrant a separate thread anyway)

1 Like