Dear all,

I want to use `Flux.jl`

to build a simple Multi-Layer Perceptron (MLP) as I did in Keras, where the input data is a matrix of `nGene`

(number of genes) by `nInd`

(number of individuals), output data is a vector of length `nInd`

to represent a trait (e.g. height). I also have two hidden layers with 64, 32 neurons, respectively.

In summary, the number of neurons is changed as: `nGene`

--> 64 --> 32 --> 1

In Keras, the MLP is:

```
# Instantiate
model = Sequential()
# Add first layer
model.add(Dense(64, input_dim=nGene))
model.add(Activation('relu'))
# Add second layer
model.add(Dense(32))
model.add(Activation('softplus'))
# Last, output layer
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, y_train, epochs=100)
```

From below, the loss (mse) of each epoch are less than one. The prediction accuracy of testing data is about 0.6, which is good.

In Flux.jl, I built the same MLP by:

```
data = Iterators.repeated((X_train_t, Y_train), 100)
model = Chain(
Dense(nGene, 64, relu),
Dense(64, 32, softplus),
Dense(32, 1))
loss(x, y) = Flux.mse(model(x), y)
ps = Flux.params(model)
opt = ADAM()
evalcb = () -> @show(loss(X_train_t, Y_train))
Flux.train!(loss, params(model), data, opt, cb = evalcb)
```

Here `X_train_t`

is a `nGene`

by `nInd`

matrix, `Y_train`

is a vector of length `nInd`

.

The loss is very very high, and the prediction accuracy of testing data is almost **zero**.

BTW, in `Flux.jl`

, if I change the optimiser to gradient descent, it even didnâ€™t converge.

I really donâ€™t know why the training process from `Flux.jl`

is wrong, could you please give me a hint on whatâ€™s wrong with my code?

Thank you very much,

-Carol