Flux is lagging far beyond tensorflow with a pretty basic use case

egolep · September 9, 2021, 3:26pm

I was trying to move from a Python ML/DL stack to Julia (so from something like sklearn+pytorch/tensorflow to MLJ+Flux).
So I decided to rewrite the Kaggle courses in Julia, switching from the Python libraries to the Julia ones too. In particular, I was implementing the notebook regarding underfitting and overfitting, using also early stopping (you can find it opening the relative exercise here: Learn Intro to Deep Learning Tutorials).
I reached the second neural network, the first “deep” neural network without early stopping and I stumbled upon a huge difference in predictive performance: in fact, in just 50 epochs tensorflow is able to achieve a loss of 0.1992 and start from something like 0.29, while the same architecture, with the same loss and the same optimizer in flux starts from something like 90.000 and reaches a loss ranging from 3.43 to 0.736 (depending on rngs, I suppose).

Which could be the problem? I followed the same steps as much as possible and followed the models in the Flux’s model zoo for the specific implementations

CarloLucibello · September 9, 2021, 4:04pm

Such difference in starting loss is highly suspicious, there is likely a mistake in your port. You should post the two scripts for specific help

Sukera · September 9, 2021, 4:21pm

I agree - if they fundamentally differ in such a way (even with trying to use the same methods) there’s likely some discrepancy left between your tensorflow code and your Flux code. I wouldn’t primarily place the blame on RNGs here - such a big difference for the “same architecture” shouldn’t happen just based on RNG alone.

Do you mind posting your two versions (tensorflow/Flux) so we can take a look?

egolep · September 9, 2021, 4:36pm

I (partially) unveiled the mistery: using MLJ to preprocess the data. If I use a version of a MinMaxScaler implemented by me for MLJModels and OneHotEncoder from MLJ, I get the behavior described above.
Then I implemented both parts “by hands” (using onehotbatch from Flux and manually scaling the fatures for the MinMaxScaler) I achieved performances comparable to Tensorflow.
It is strange to me that MLJ cooperate that bad with Flux, so I’ll investigate further on why this actually happens.
Meanwhile, I would call the problem closed since it’s obvious that Flux is not to be blamed.

Many thanks for your replies!

ToucheSir · September 9, 2021, 4:42pm

Rather than MLJ and Flux not cooperating (if they didn’t, https://github.com/FluxML/MLJFlux.jl wouldn’t exist!), it seems like your by hand implementation may be different than what MLJ is doing. Worth comparing both to verify that’s the case.

egolep · September 9, 2021, 4:43pm

That’s why I said that it looks quite strange. Now I’m looking at the output of the preprocessing that I did with MLJ and the one done by hand in order to understand what’s different.

Topic		Replies	Views
Flux results not similar to Tensorflow Machine Learning question	3	1815	March 11, 2019
The same network performs differently in Flux.jl and tensorflow Machine Learning performance	13	3065	December 18, 2019
MLJFlux is a lot slower than the same algorithm written in Flux Performance flux , machine-learning , mlj , neural-network	3	1593	June 21, 2021
Why the result from Flux.jl is totally different from tf.Keras (with the same simple MLP) Machine Learning question , package	6	1457	December 3, 2019
MLJFlux much faster for simple MNIST example than TensorFlow? Machine Learning flux , mlj	0	624	November 5, 2020

Flux is lagging far beyond tensorflow with a pretty basic use case

Related topics