Params not getting updated during training

m_scorpion · October 8, 2020, 2:13pm

im very new to julia and flux, so i wrote a very simple program and rand into lots of problems which solved some of them.
but my main problem is that during training model weights and biases donot get updated.

model = Flux.Chain(Dense(10,1,σ))
train_loader = Flux.Data.DataLoader((a_train1,b_train),batchsize=10)
test_loader = Flux.Data.DataLoader((a_test,b_test),batchsize=100)
loss(x, y) = Flux.mse(round(model(x)[1]), y)
opt = ADAM(0.5)

ps = params(model)

Flux.train!(loss, ps, train_loader, opt)

i noticed that inside the flux , the gradient gets params not callable error .

dmolina · October 8, 2020, 2:25pm

Why it is?

I think it should be:

loss(x, y) = Flux.mse(model(x), y)

The [1] is strange for me.

Also, which is the exact message error you are getting? (I never have problem not updating parameters).

m_scorpion · October 8, 2020, 2:38pm

because i got this error at first after a while i noticed i had to separate x and y for loss function

MethodError: no method matching round(::Array{Float32,2})

dmolina · October 8, 2020, 2:44pm

I do not understand that. round remove all decimal digits by default, and it is strange for me that you want that.

Is it a classification or regressions problem? If it is a classification problem (you want to predict a category), the loss function should be crossentropy or similar, not mse.
If it is a regression problem (you want to predict a real number), the round has not sense for me.

m_scorpion · October 8, 2020, 3:00pm

its kinda a mix of those basically its a regression problem
but in this simple case can be used like a classification problem

m_scorpion · October 8, 2020, 3:09pm

anyway that doesn’t seem to be the problem , last night i got it working with all the same code
but now again it is stuck with same biased .
the gradient inside train! function become nothing and as no error is found , the training completes without problem
i removed all those additional lines and still no update on parameters

train_loader = Flux.Data.DataLoader((a_train1,b_train),batchsize=10)
test_loader = Flux.Data.DataLoader((a_test,b_test),batchsize=100)
trainmode!(model,true)
loss(x, y) = Flux.mse(model(x), y)

opt = ADAM(0.5)

ps = params(model)

Flux.train!(loss, ps, train_loader, opt)

dmolina · October 8, 2020, 4:13pm

The code seems good for me.
Could you give a minimum running code that give you the problem?
Also, which version of Flux are you using?

m_scorpion · October 8, 2020, 4:38pm

sure , Flux is 0.11.1

model = Flux.Chain(Dense(10,1,σ))
a = rand([0,1,2,3,4,5,6,7,8,9], 10,50000)
b = rand([0,1,2,3,4,5,6,7,8,9], 1,50000)
accuracy(x, y, model) = mean((model(x)) .== y)
train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
trainmode!(model,true)
loss(x, y) = Flux.mae(model(x), y)
opt = ADAM(0.5)
ps = params(model)
Flux.train!(loss, ps, train_loader, opt)

dmolina · October 8, 2020, 4:59pm

You are right, with the same version of Flux gives me:

julia> train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
Flux.Data.DataLoader{Tuple{Array{Int64,2},Array{Int64,2}}}(([9 4 … 4 7; 7 7 … 4 7; … ; 2 4 … 9 6; 2 2 … 3 3], [2 4 … 0 2]), 10, 50000, true, 50000, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999, 50000], false)
julia> trainmode!(model,true)
Chain(Dense(10, 1, σ))
julia> loss(x, y) = Flux.mae(model(x), y)
loss (generic function with 1 method)
julia> opt = ADAM(0.5)
ADAM(0.5, (0.9, 0.999), IdDict{Any,Any}())
julia> ps = params(model)
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> ps
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia>

As you can see, ps is updated after the first epoch training, but it is not updated more.
Anyone have another idea?

PS: I suggest you to try to learn a real function (even a simple one) more than a random function.

nilshg · October 8, 2020, 5:07pm

I don’t know about your Flux issue unfortunately, but round works on individual numbers, so if you have an array of floats (as your error message suggests) and want to round them, you need to broadcast: round.(model(x)).

m_scorpion · October 8, 2020, 5:23pm

m_scorpion:

model = Flux.Chain(Dense(10,1,σ))
a = rand([0,1,2,3,4,5,6,7,8,9], 10,50000)
b = rand([0,1,2,3,4,5,6,7,8,9], 1,50000)
accuracy(x, y, model) = mean((model(x)) .== y)
train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
trainmode!(model,true)
loss(x, y) = Flux.mae(model(x), y)
opt = ADAM(0.5)
ps = params(model)
Flux.train!(loss, ps, train_loader, opt)

its very good idea , but i removed all of those and main problem still exists , i was working late last night on it and suddenly got it working , but today got stuck again

m_scorpion · October 8, 2020, 5:24pm

exactly , maybe i should downgrade or something

m_scorpion · October 8, 2020, 5:26pm

i even went further inside the code and copied the train function from Flux and imported Zygote .
the problem is the gradient when goes down to single parameters it goes to nothing
so the loop just continues with no error
but it never updates the weights and biases

dmolina · October 8, 2020, 5:36pm

I think it is more a conceptual error than programming error. It is suppose to know about that, but my brain is lazy today :-).

A neuron can only give results between 0 and 1, and your target is between [1, 9] so it is not possible to optimize more, and the process stops. Try to normalize both the input as the output and the model will be working.

m_scorpion · October 8, 2020, 5:43pm

i have done the same experiment on lots of ML frameworks (TF,Torch,MX)
neuron can generate results larger than 1 based on the activation

dmolina · October 8, 2020, 5:47pm

But in Flux you are using the sigmoid function that avoid results larger than 1. I should have said ‘that neuron’, not ‘a neuron’. In TF, for instance, I think you were using another activation function (by default it is None).

DoktorMike · October 8, 2020, 5:48pm

That’s true in general but in your case you’re using a sigmoid activation function which is constrained between 0 and 1.

m_scorpion · October 8, 2020, 7:27pm

yes , i did not pay attention to sigmoid , i changed it to relu and also normalized inputs and outputs
still no update on the params

m_scorpion · October 8, 2020, 8:39pm

a strange thing, i removed the activation function from dense layer and now its working.
in TF we usually leave the last output layer without activation and use an out side activation function on model output .
how is it done in Flux?

dmolina · October 9, 2020, 10:14am

I think it is done as the final parameter in the Chain:

Chain(Dense(10, 2), here_final_activation)

But I have not the computer now to check it.

Topic		Replies	Views
Problems using Flux New to Julia	7	440	June 6, 2023
Flux.jl manual training loop results in `error gradent(F, ::Params) are deprecated` New to Julia	2	106	June 16, 2025
Taking gradient to update a Flux.jl CNN New to Julia question	2	331	May 4, 2024
Flux.jl changes in api General Usage	2	209	March 17, 2023
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	679	January 19, 2024

Params not getting updated during training

Related topics