# Flailing on evaluating batches as matrices of features

This is my first significant Flux program, be gentle I have a model `m`. It takes two vectors as input (via a `Parallel(vcat...)` layer), the first of which is a vector of floats and the second is a one-hot. I can get it to work if I loop through the features one by one. But I’m trying to do a batch evaluation. So I generate two matrices of features, `m1 = [f1 f2 f3 ...]` where `f1` etc are the individual vectors of reals, and `m2 = [b1 b2 b3 ...]` where `b1` etc are the one-hot vectors.

When I do the following:

``````m((f1, b1))
``````

I get what I expect, a vector output of the right length.

When I do:

``````m((m1, m2))
``````

I get… a vector output, of the same length. Which is to say, I don’t get individual vectors corresponding to each of the features. I just get a single vector. And it doesn’t correspond to any of the vectors I would expect for the individual inputs.

I’m doing something wildly wrong, yes? Because in my training loop, the loss function gets called with `m1` and `m2` which propagates to the `m((m1, m2))` call, and then calculates a loss with the result, which is nonsensical. And, indeed, it doesn’t converge – unless I train with exactly one example in the training set.

But I don’t have enough experience to see what I’m doing wrong, this appears to be what is recommended in the Flux documentation.

Hi Glen, can you print out the size/type of the inputs just to be sure? I tried a simple example and didn’t get a vector output:

``````julia> p = Parallel(vcat, identity, identity)
Parallel(vcat, identity, identity)

julia> p(zeros(1), ones(1))
2-element Vector{Float64}:
0.0
1.0

julia> p(zeros(1, 1), ones(1, 1))
2×1 Matrix{Float64}:
0.0
1.0

julia> p((zeros(1), ones(1)))
2-element Vector{Float64}:
0.0
1.0

julia> p((zeros(1, 1), ones(1, 1)))
2×1 Matrix{Float64}:
0.0
1.0
``````

Sure. My model looks like this:

``````m = Chain(
Parallel(vcat,
Dense(delay*2, 20, tanh),
Dense(length(select), 10, tanh)),
Dense(30, 10),
DelayKoopman(10, sequence_len, delay),
Dense(10, 10, tanh),
Dense(10, 2)
) |> gpu
``````

Note that `DelayKoopman` is a custom layer, but I don’t think it has anything to do with this issue. It takes in a vector of size 10 and outputs a vector of size 50 (right now, this is an external parameter).

My input vectors are of size `(20,)` and `(2,)` for the Float and one-hot vectors, respectively. My batched matrices are of size `(20,14)` and `(2,14)`, respectively.

So when I call

`````` m((f1, b1))
``````

I get a vector of size `(2,50)`, which is what I expect. But when I call

`````` m((m1, m2))
``````

I also get a single vector, of size `(2,50)`.

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the `vcat` with a wrapper function that first displays its inputs and then calls `vcat`. If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

I will do that shortly.

So, to be clear here: I want a matrix output, eg of 28x50. One vector output for each bivector input. Is that a reasonable expectation?

1 Like

This post was temporarily hidden by the community for possibly being off-topic, inappropriate, or spammy.

So it turns out that the problem was, in fact, my custom layer (because of course it was). I appreciate this pointer:

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the `vcat` with a wrapper function that first displays its inputs and then calls `vcat` . If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

Once I started doing this I realized that all the Flux layers are doing is straightforward linear algebra. I had assumed that there was more magic under the hood than there actually is.

Which is awesome. The entire thing makes a lot more sense to me now.

1 Like