Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP)

Dear all,

I want to use Flux.jl to build a simple Multi-Layer Perceptron (MLP) as I did in Keras, where the input data is a matrix of nGene (number of genes) by nInd (number of individuals), output data is a vector of length nInd to represent a trait (e.g. height). I also have two hidden layers with 64, 32 neurons, respectively.

In summary, the number of neurons is changed as: nGene → 64 → 32 → 1

In Keras, the MLP is:

# Instantiate
model = Sequential()

# Add first layer
model.add(Dense(64, input_dim=nGene))
model.add(Activation('relu'))
# Add second layer
model.add(Dense(32))
model.add(Activation('softplus'))
# Last, output layer
model.add(Dense(1))


model.compile(loss='mean_squared_error', optimizer='adam') 
model.fit(X_train, y_train, epochs=100)

From below, the loss (mse) of each epoch are less than one. The prediction accuracy of testing data is about 0.6, which is good.

In Flux.jl, I built the same MLP by:

data = Iterators.repeated((X_train_t, Y_train), 100)

model = Chain(
  Dense(nGene, 64, relu),
  Dense(64, 32, softplus),
  Dense(32, 1))

loss(x, y) = Flux.mse(model(x), y)
ps = Flux.params(model)
opt = ADAM() 
evalcb = () -> @show(loss(X_train_t, Y_train))

Flux.train!(loss, params(model), data, opt, cb = evalcb)

Here X_train_t is a nGene by nInd matrix, Y_train is a vector of length nInd.

The loss is very very high, and the prediction accuracy of testing data is almost zero.

BTW, in Flux.jl, if I change the optimiser to gradient descent, it even didn’t converge.

image

I really don’t know why the training process from Flux.jl is wrong, could you please give me a hint on what’s wrong with my code?

Thank you very much,

-Carol

Have you verified that you use the same step sizes in the optimizer, and that the mean squared error is calculated in the same way? Have you transposed the data in the appropriate way to account for potentially different conventions in the two libraries?

1 Like

Hi,

Thank you very much for your useful suggestions.

1, I’m sure the default step size and other parameters are the same at least in the Adam optimizer.
2, Even if the mean squared error is calculated in a different way, I don’t think it will result in such a bad prediction accuracy in Flux.jl
3, in Flux.jl, the input data is a matrix of #genes by #samples. I followed the tutorial of MNIST example, where the input data is a matrix of #pixel by #samples. If I transposed the data in another way, I cannot run the Flux code.

Please let me know for other Flux tutorials on the prediction problem instead of classification. I can only find classification examples such as MNIST.

Thanks again :slight_smile:

given the orders of magnitude difference in your flux callback I have an idea…

Did you normalize your data before running the model? Range scale it so it’s between 0 and 1, or -1 and 1. Not sure if Keras is doing that automatically or not. It could be that you want to use a different weight initialization as well ie: glorut or whatever.

Hi,

Thank you very much for your reply.

The elements of the input matrix are either 0 or 1, so I didn’t normalize it in Flux. And I didn’t find that Keras do normalization automatically.

Please let me know for other Flux tutorials on the prediction problem instead of classification. I can only find classification examples such as MNIST.

Thanks again,
Carol

There was a similar question posted here a little while ago, and in that situation, it seemed to be the case that keras was using a batch size of 32 by default and Flux wasn’t, and that was where the difference in behavior was coming from. I wonder if you’re seeing the same thing.

Here’s a link to that thread- the post just below this one has a version of the original author’s code that matches the keras/tf behavior.

Hope that helps! :slight_smile:

1 Like

Thank you very much. It’s a good prediction example, and I found I made mistake on the transposition of input data. I didn’t transpose Y_train.

I also learnt how to set batch_size in Flux.

Thanks again!!! :slight_smile:

3 Likes