Flailing on evaluating batches as matrices of features

GlenHenshaw · January 11, 2022, 11:05pm

This is my first significant Flux program, be gentle

I have a model m. It takes two vectors as input (via a Parallel(vcat...) layer), the first of which is a vector of floats and the second is a one-hot. I can get it to work if I loop through the features one by one. But I’m trying to do a batch evaluation. So I generate two matrices of features, m1 = [f1 f2 f3 ...] where f1 etc are the individual vectors of reals, and m2 = [b1 b2 b3 ...] where b1 etc are the one-hot vectors.

When I do the following:

m((f1, b1))

I get what I expect, a vector output of the right length.

When I do:

m((m1, m2))

I get… a vector output, of the same length. Which is to say, I don’t get individual vectors corresponding to each of the features. I just get a single vector. And it doesn’t correspond to any of the vectors I would expect for the individual inputs.

I’m doing something wildly wrong, yes? Because in my training loop, the loss function gets called with m1 and m2 which propagates to the m((m1, m2)) call, and then calculates a loss with the result, which is nonsensical. And, indeed, it doesn’t converge – unless I train with exactly one example in the training set.

But I don’t have enough experience to see what I’m doing wrong, this appears to be what is recommended in the Flux documentation.

ToucheSir · January 11, 2022, 11:34pm

Hi Glen, can you print out the size/type of the inputs just to be sure? I tried a simple example and didn’t get a vector output:

julia> p = Parallel(vcat, identity, identity)
Parallel(vcat, identity, identity)

julia> p(zeros(1), ones(1))
2-element Vector{Float64}:
 0.0
 1.0

julia> p(zeros(1, 1), ones(1, 1))
2×1 Matrix{Float64}:
 0.0
 1.0

julia> p((zeros(1), ones(1)))
2-element Vector{Float64}:
 0.0
 1.0

julia> p((zeros(1, 1), ones(1, 1)))
2×1 Matrix{Float64}:
 0.0
 1.0

GlenHenshaw · January 11, 2022, 11:46pm

Sure. My model looks like this:

m = Chain(
    Parallel(vcat,
              Dense(delay*2, 20, tanh),
              Dense(length(select), 10, tanh)),
      Dense(30, 10),
      DelayKoopman(10, sequence_len, delay),
      Dense(10, 10, tanh),
      Dense(10, 2)
      ) |> gpu

Note that DelayKoopman is a custom layer, but I don’t think it has anything to do with this issue. It takes in a vector of size 10 and outputs a vector of size 50 (right now, this is an external parameter).

My input vectors are of size (20,) and (2,) for the Float and one-hot vectors, respectively. My batched matrices are of size (20,14) and (2,14), respectively.

So when I call

 m((f1, b1))

I get a vector of size (2,50), which is what I expect. But when I call

 m((m1, m2))

I also get a single vector, of size (2,50).

ToucheSir · January 12, 2022, 12:18am

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the vcat with a wrapper function that first displays its inputs and then calls vcat. If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

GlenHenshaw · January 12, 2022, 12:52am

I will do that shortly.

So, to be clear here: I want a matrix output, eg of 28x50. One vector output for each bivector input. Is that a reasonable expectation?

GlenHenshaw · January 14, 2022, 4:13pm

So it turns out that the problem was, in fact, my custom layer (because of course it was). I appreciate this pointer:

Can you try removing all layers after the Parallel first and seeing what the output looks like? If the shape is still incorrect, try replacing the vcat with a wrapper function that first displays its inputs and then calls vcat . If it’s correct, then you can try adding the successive layers back one at a time until you see the incorrect output size again.

Once I started doing this I realized that all the Flux layers are doing is straightforward linear algebra. I had assumed that there was more magic under the hood than there actually is.

Which is awesome. The entire thing makes a lot more sense to me now.

Topic		Replies	Views
Input to Neural Network Machine Learning	1	294	November 24, 2022
Array manipulation New to Julia	5	1684	March 29, 2019
Matrix multipcation broadcasting? New to Julia	3	353	June 17, 2023
Flux, categorical arrays, roc curves, confusion matrices Machine Learning flux	14	1051	December 12, 2022
Batch/batch in Flux? Machine Learning flux	2	1816	June 29, 2020

Flailing on evaluating batches as matrices of features

Related topics