This is my first significant Flux program, be gentle

I have a model `m`

. It takes two vectors as input (via a `Parallel(vcat...)`

layer), the first of which is a vector of floats and the second is a one-hot. I can get it to work if I loop through the features one by one. But I’m trying to do a batch evaluation. So I generate two matrices of features, `m1 = [f1 f2 f3 ...]`

where `f1`

etc are the individual vectors of reals, and `m2 = [b1 b2 b3 ...]`

where `b1`

etc are the one-hot vectors.

When I do the following:

```
m((f1, b1))
```

I get what I expect, a vector output of the right length.

When I do:

```
m((m1, m2))
```

I get… a vector output, of the same length. Which is to say, I don’t get individual vectors corresponding to each of the features. I just get a single vector. And it doesn’t correspond to any of the vectors I would expect for the individual inputs.

I’m doing something wildly wrong, yes? Because in my training loop, the loss function gets called with `m1`

and `m2`

which propagates to the `m((m1, m2))`

call, and then calculates a loss with the result, which is nonsensical. And, indeed, it doesn’t converge – unless I train with exactly one example in the training set.

But I don’t have enough experience to see what I’m doing wrong, this appears to be what is recommended in the Flux documentation.