Flailing on evaluating batches as matrices of features

This is my first significant Flux program, be gentle :wink:

I have a model m. It takes two vectors as input (via a Parallel(vcat...) layer), the first of which is a vector of floats and the second is a one-hot. I can get it to work if I loop through the features one by one. But I’m trying to do a batch evaluation. So I generate two matrices of features, m1 = [f1 f2 f3 ...] where f1 etc are the individual vectors of reals, and m2 = [b1 b2 b3 ...] where b1 etc are the one-hot vectors.

When I do the following:

m((f1, b1))

I get what I expect, a vector output of the right length.

When I do:

m((m1, m2))

I get… a vector output, of the same length. Which is to say, I don’t get individual vectors corresponding to each of the features. I just get a single vector. And it doesn’t correspond to any of the vectors I would expect for the individual inputs.

I’m doing something wildly wrong, yes? Because in my training loop, the loss function gets called with m1 and m2 which propagates to the m((m1, m2)) call, and then calculates a loss with the result, which is nonsensical. And, indeed, it doesn’t converge – unless I train with exactly one example in the training set.

But I don’t have enough experience to see what I’m doing wrong, this appears to be what is recommended in the Flux documentation.

Hi Glen, can you print out the size/type of the inputs just to be sure? I tried a simple example and didn’t get a vector output:

julia> p = Parallel(vcat, identity, identity)
Parallel(vcat, identity, identity)

julia> p(zeros(1), ones(1))
2-element Vector{Float64}:
 0.0
 1.0

julia> p(zeros(1, 1), ones(1, 1))
2×1 Matrix{Float64}:
 0.0
 1.0

julia> p((zeros(1), ones(1)))
2-element Vector{Float64}:
 0.0
 1.0

julia> p((zeros(1, 1), ones(1, 1)))
2×1 Matrix{Float64}:
 0.0
 1.0

Sure. My model looks like this:

m = Chain(
    Parallel(vcat,
              Dense(delay*2, 20, tanh),
              Dense(length(select), 10, tanh)),
      Dense(30, 10),
      DelayKoopman(10, sequence_len, delay),
      Dense(10, 10, tanh),
      Dense(10, 2)
      ) |> gpu

Note that DelayKoopman is a custom layer, but I don’t think it has anything to do with this issue. It takes in a vector of size 10 and outputs a vector of size 50 (right now, this is an external parameter).

My input vectors are of size (20,) and (2,) for the Float and one-hot vectors, respectively. My batched matrices are of size (20,14) and (2,14), respectively.

So when I call

 m((f1, b1))

I get a vector of size (2,50), which is what I expect. But when I call

 m((m1, m2))

I also get a single vector, of size (2,50).

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the vcat with a wrapper function that first displays its inputs and then calls vcat. If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

I will do that shortly.

So, to be clear here: I want a matrix output, eg of 28x50. One vector output for each bivector input. Is that a reasonable expectation?

1 Like

So it turns out that the problem was, in fact, my custom layer (because of course it was). I appreciate this pointer:

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the vcat with a wrapper function that first displays its inputs and then calls vcat . If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

Once I started doing this I realized that all the Flux layers are doing is straightforward linear algebra. I had assumed that there was more magic under the hood than there actually is.

Which is awesome. The entire thing makes a lot more sense to me now.

1 Like