Comparation with Flux leads to odd results in Flux

sylvaticus · July 10, 2020, 11:25am

Hello,
I’m developing a toy ML library (BetaML) to learn a bit about ML algorithms (I’m pretty a newbie).

I tested BetaML with a bike sharing demand forecast example and compared it with Flux:
https://github.com/sylvaticus/BetaML.jl/blob/master/notebooks/NN%20-%20Bike%20sharing%20demand%20forecast%20(daily%20db).ipynb

Runnable Binder notebook: Binder

However, when I use the same model structure, data, training algorithms and hyperparameters, I experience strange behaviour in Flux, e.g. the data predicted by Flux for the training sample seems to be truncated. Also, predictions seems not to move so much as with the BetaML results:

BetaML output:

Flux output:

BetaML output:

Flux output:

BetaML output:

Flux output:

Note that also BetaML results tend to underestimate the high demand levels observed in the validation period, but in a less pronounced way that Flux.

I am wondering what cause this difference… weight initialisation ?

baggepinnen · July 10, 2020, 1:12pm

Are you using relu activation functions? It looks like the output has been clamped in a way typical for relu. If this is the case, the problem tends to go away with further training or using some strategy to make it easier to find a better minimum, such as residual connections etc.

You can also try to normalize your data prior to train g so that it has mean zero and variance 1,it also helps with this problem.

sylvaticus · July 10, 2020, 1:29pm

Thank you, I am using sigmoid for the hidden layer and identity for the output one:


# Defining the net model and load it with data...
Flux_nn = Chain(Dense(23,12,Flux.sigmoid),
                Dense(12,1,identity))
loss(x, y) = Flux.mse(Flux_nn(x), y)
ps = Flux.params(Flux_nn)
nndata = Flux.Data.DataLoader(xtrain', ytrain', batchsize=8)

What I found strange is that using instead an other library - but with the same parameters, including batch size, optimizer and number of epochs - I don’t get this effect.

I will try increasing the epochs. But it could also be that Flux doesn’t do by default a certain number of tricks, like Xavier weight initialisation, or random sampling of the batches… I did notice that Flux has a philosophy not to provide default arguments/optimisations. I understand it, but for newcomers could be useful to have for example an optimizer and a loss function by default.

baggepinnen · July 10, 2020, 1:31pm

No Flux does not do any random sampling for you, it’s all up to the iterator you pass it. The weight initialization should be one of the standard ones though.

baggepinnen · July 10, 2020, 1:34pm

I would still normalize the data. Since the data you are predicting is in the 1000s, the initial gradients will drive all the activation functions to saturation before the linear output layer has caught up. By normalizing, you’ll probably have a much faster convergence and to a better minimum.

sylvaticus · July 10, 2020, 1:39pm

Yes, actually both X and Y are scaled in the script (for Y they are just divided by 1000, as if I normalise to mean 0, s.d. 1 I may have some negative demand when I rescale them back).

Edit: you was right, to get similar results it was enought to load the data with shuffling: nndata = Flux.Data.DataLoader(xtrain', ytrain', batchsize=8,shuffle=true) (in BetaML I do it by default unless you opt-off with sequential=true)

Topic		Replies	Views
Flux results not similar to Tensorflow Machine Learning question	3	1813	March 11, 2019
Problems with Flux Machine Learning	2	1595	March 14, 2018
Flux is lagging far beyond tensorflow with a pretty basic use case Machine Learning tensorflow , flux	5	1201	September 9, 2021
Problems with Flux NN regression Machine Learning question , package	1	408	November 19, 2021
Flux vs Keras time series prediction performance Machine Learning question	3	1673	April 30, 2020

Comparation with Flux leads to odd results in Flux

Related topics