Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP)

Carol · December 2, 2019, 9:46am

Dear all,

I want to use Flux.jl to build a simple Multi-Layer Perceptron (MLP) as I did in Keras, where the input data is a matrix of nGene (number of genes) by nInd (number of individuals), output data is a vector of length nInd to represent a trait (e.g. height). I also have two hidden layers with 64, 32 neurons, respectively.

In summary, the number of neurons is changed as: nGene → 64 → 32 → 1

In Keras, the MLP is:

# Instantiate
model = Sequential()

# Add first layer
model.add(Dense(64, input_dim=nGene))
model.add(Activation('relu'))
# Add second layer
model.add(Dense(32))
model.add(Activation('softplus'))
# Last, output layer
model.add(Dense(1))


model.compile(loss='mean_squared_error', optimizer='adam') 
model.fit(X_train, y_train, epochs=100)

From below, the loss (mse) of each epoch are less than one. The prediction accuracy of testing data is about 0.6, which is good.

In Flux.jl, I built the same MLP by:

data = Iterators.repeated((X_train_t, Y_train), 100)

model = Chain(
  Dense(nGene, 64, relu),
  Dense(64, 32, softplus),
  Dense(32, 1))

loss(x, y) = Flux.mse(model(x), y)
ps = Flux.params(model)
opt = ADAM() 
evalcb = () -> @show(loss(X_train_t, Y_train))

Flux.train!(loss, params(model), data, opt, cb = evalcb)

Here X_train_t is a nGene by nInd matrix, Y_train is a vector of length nInd.

The loss is very very high, and the prediction accuracy of testing data is almost zero.

BTW, in Flux.jl, if I change the optimiser to gradient descent, it even didn’t converge.

I really don’t know why the training process from Flux.jl is wrong, could you please give me a hint on what’s wrong with my code?

Thank you very much,

-Carol

baggepinnen · December 2, 2019, 10:32am

Have you verified that you use the same step sizes in the optimizer, and that the mean squared error is calculated in the same way? Have you transposed the data in the appropriate way to account for potentially different conventions in the two libraries?

Carol · December 2, 2019, 11:05am

Hi,

Thank you very much for your useful suggestions.

1, I’m sure the default step size and other parameters are the same at least in the Adam optimizer.
2, Even if the mean squared error is calculated in a different way, I don’t think it will result in such a bad prediction accuracy in Flux.jl
3, in Flux.jl, the input data is a matrix of #genes by #samples. I followed the tutorial of MNIST example, where the input data is a matrix of #pixel by #samples. If I transposed the data in another way, I cannot run the Flux code.

Please let me know for other Flux tutorials on the prediction problem instead of classification. I can only find classification examples such as MNIST.

Thanks again

anon92994695 · December 2, 2019, 11:34am

given the orders of magnitude difference in your flux callback I have an idea…

Did you normalize your data before running the model? Range scale it so it’s between 0 and 1, or -1 and 1. Not sure if Keras is doing that automatically or not. It could be that you want to use a different weight initialization as well ie: glorut or whatever.

Carol · December 2, 2019, 11:44pm

Hi,

Thank you very much for your reply.

The elements of the input matrix are either 0 or 1, so I didn’t normalize it in Flux. And I didn’t find that Keras do normalization automatically.

Please let me know for other Flux tutorials on the prediction problem instead of classification. I can only find classification examples such as MNIST.

Thanks again,
Carol

dellison · December 3, 2019, 12:11am

There was a similar question posted here a little while ago, and in that situation, it seemed to be the case that keras was using a batch size of 32 by default and Flux wasn’t, and that was where the difference in behavior was coming from. I wonder if you’re seeing the same thing.

Here’s a link to that thread- the post just below this one has a version of the original author’s code that matches the keras/tf behavior.

Hope that helps!

Carol · December 3, 2019, 5:26am

Thank you very much. It’s a good prediction example, and I found I made mistake on the transposition of input data. I didn’t transpose Y_train.

I also learnt how to set batch_size in Flux.

Thanks again!!!

Topic		Replies	Views
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3041	December 18, 2019
Problem understanding Loss function behavior using Flux.jl Machine Learning flux	5	1307	August 7, 2020
I am unable to fit a simple 2d function using a neural network in Julia. Am I doing somethng wrong here? General Usage flux , neural-network	3	255	September 5, 2023
Can't replicate neural network from Python's sklearn using Flux.jl General Usage flux , machine-learning , neural-network	11	1340	June 20, 2021
Gradient Descent, each datapoint is an array, in Flux General Usage flux	0	277	August 2, 2022

Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP)

Related topics