Flux is lagging far beyond tensorflow with a pretty basic use case

I was trying to move from a Python ML/DL stack to Julia (so from something like sklearn+pytorch/tensorflow to MLJ+Flux).
So I decided to rewrite the Kaggle courses in Julia, switching from the Python libraries to the Julia ones too. In particular, I was implementing the notebook regarding underfitting and overfitting, using also early stopping (you can find it opening the relative exercise here: Learn Intro to Deep Learning Tutorials | Kaggle).
I reached the second neural network, the first “deep” neural network without early stopping and I stumbled upon a huge difference in predictive performance: in fact, in just 50 epochs tensorflow is able to achieve a loss of 0.1992 and start from something like 0.29, while the same architecture, with the same loss and the same optimizer in flux starts from something like 90.000 and reaches a loss ranging from 3.43 to 0.736 (depending on rngs, I suppose).

Which could be the problem? I followed the same steps as much as possible and followed the models in the Flux’s model zoo for the specific implementations

Such difference in starting loss is highly suspicious, there is likely a mistake in your port. You should post the two scripts for specific help


I agree - if they fundamentally differ in such a way (even with trying to use the same methods) there’s likely some discrepancy left between your tensorflow code and your Flux code. I wouldn’t primarily place the blame on RNGs here - such a big difference for the “same architecture” shouldn’t happen just based on RNG alone.

Do you mind posting your two versions (tensorflow/Flux) so we can take a look?

I (partially) unveiled the mistery: using MLJ to preprocess the data. If I use a version of a MinMaxScaler implemented by me for MLJModels and OneHotEncoder from MLJ, I get the behavior described above.
Then I implemented both parts “by hands” (using onehotbatch from Flux and manually scaling the fatures for the MinMaxScaler) I achieved performances comparable to Tensorflow.
It is strange to me that MLJ cooperate that bad with Flux, so I’ll investigate further on why this actually happens.
Meanwhile, I would call the problem closed since it’s obvious that Flux is not to be blamed.

Many thanks for your replies!

Rather than MLJ and Flux not cooperating (if they didn’t, GitHub - FluxML/MLJFlux.jl: Wrapping deep learning models from the package Flux.jl for use in the MLJ.jl toolbox wouldn’t exist!), it seems like your by hand implementation may be different than what MLJ is doing. Worth comparing both to verify that’s the case.


That’s why I said that it looks quite strange. Now I’m looking at the output of the preprocessing that I did with MLJ and the one done by hand in order to understand what’s different.