Params not getting updated during training

m_scorpion · October 9, 2020, 6:24pm

i started from a minimal code again and it is stuck training after first round
i guess i have to go back to mx and tf for now
i found it , when dataloader batchsize is more than 1 , it gets like this

ToucheSir · October 9, 2020, 8:50pm

This seems like a misunderstanding of Flux vs TF/MXnet APIs. Could you please post the actual minimal code example you tried (with dummy data generation if need be) and also the Python code you’re trying to translate? I ask because the issue seems to most likely be in the data preprocessing or loading side rather than the model definition or training.

m_scorpion · October 10, 2020, 12:08am

i have posted the simple code above
im still working on different ways to do this

m_scorpion · October 11, 2020, 11:01am

here is the minimal code :

using Flux
model = Chain(Dense(10,1))
x=rand([1,2,3,4,5,6,7,8,9,0],(10,10000))
y=rand([1,2,3,4,5,6,7,8,9,0],(1,10000))
function losss(x,y)
    return Flux.mae(model(x),y)
end
optimiser = Flux.Descent(0.01)
train_loader = Flux.Data.DataLoader((x,y),batchsize=1)
Flux.@epochs 10 Flux.train!(losss,params(model),train_loader,optimiser,
    cb = Flux.throttle(() -> println(losss(x,y)),10))

the problem raises after first epoch , all the weights and biases in model becomes NaN and as the result model’s output becomes NaN ,also the loss result ,hence the training fails to continue…

theogf · October 11, 2020, 3:08pm

This could be due to an overflow, Descent() does not protect you from exploding gradients and such. You could either clip your gradients Flux.Optimise.Optimiser(ClipGradient(10.0), Descent(0.01)) or use a momentum based optimiser like ADAM().

ToucheSir · October 11, 2020, 9:12pm

I would also be surprised if this didn’t result in NaNs if you wrote something similar in PyTorch, mostly because there’s no activation function like sigmoid/tanh/etc. to reduce the size of the outputs. If you try reducing the magnitude of your inputs (either by dividing everything by, say 100 or using a rand variant that samples from [0, 1)), the network should be less susceptible to spitting out NaNs. You may also want to try a larger batch size: it will “smooth out” the gradient updates and thus also reduce the possibility of overflowing into NaNs.

Topic		Replies	Views
Problems using Flux New to Julia	7	447	June 6, 2023
Flux.jl manual training loop results in `error gradent(F, ::Params) are deprecated` New to Julia	2	118	June 16, 2025
Taking gradient to update a Flux.jl CNN New to Julia question	2	344	May 4, 2024
Flux.jl changes in api General Usage	2	213	March 17, 2023
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	698	January 19, 2024

Params not getting updated during training

Related topics