Params not getting updated during training

im very new to julia and flux, so i wrote a very simple program and rand into lots of problems which solved some of them.
but my main problem is that during training model weights and biases donot get updated.

model = Flux.Chain(Dense(10,1,σ))
train_loader = Flux.Data.DataLoader((a_train1,b_train),batchsize=10)
test_loader = Flux.Data.DataLoader((a_test,b_test),batchsize=100)
loss(x, y) = Flux.mse(round(model(x)[1]), y)
opt = ADAM(0.5)

ps = params(model)

Flux.train!(loss, ps, train_loader, opt)

i noticed that inside the flux , the gradient gets params not callable error .

Why it is?

I think it should be:

loss(x, y) = Flux.mse(model(x), y)

The [1] is strange for me.

Also, which is the exact message error you are getting? (I never have problem not updating parameters).

because i got this error at first after a while i noticed i had to separate x and y for loss function

MethodError: no method matching round(::Array{Float32,2})

I do not understand that. round remove all decimal digits by default, and it is strange for me that you want that.

Is it a classification or regressions problem? If it is a classification problem (you want to predict a category), the loss function should be crossentropy or similar, not mse.
If it is a regression problem (you want to predict a real number), the round has not sense for me.

its kinda a mix of those basically its a regression problem
but in this simple case can be used like a classification problem

anyway that doesn’t seem to be the problem , last night i got it working with all the same code
but now again it is stuck with same biased .
the gradient inside train! function become nothing and as no error is found , the training completes without problem
i removed all those additional lines and still no update on parameters

train_loader = Flux.Data.DataLoader((a_train1,b_train),batchsize=10)
test_loader = Flux.Data.DataLoader((a_test,b_test),batchsize=100)
trainmode!(model,true)
loss(x, y) = Flux.mse(model(x), y)

opt = ADAM(0.5)

ps = params(model)

Flux.train!(loss, ps, train_loader, opt)

The code seems good for me.
Could you give a minimum running code that give you the problem?
Also, which version of Flux are you using?

sure , Flux is 0.11.1

model = Flux.Chain(Dense(10,1,σ))
a = rand([0,1,2,3,4,5,6,7,8,9], 10,50000)
b = rand([0,1,2,3,4,5,6,7,8,9], 1,50000)
accuracy(x, y, model) = mean((model(x)) .== y)
train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
trainmode!(model,true)
loss(x, y) = Flux.mae(model(x), y)
opt = ADAM(0.5)
ps = params(model)
Flux.train!(loss, ps, train_loader, opt)

You are right, with the same version of Flux gives me:

julia> train_loader = Flux.Data.DataLoader((a,b),batchsize=10)
Flux.Data.DataLoader{Tuple{Array{Int64,2},Array{Int64,2}}}(([9 4 … 4 7; 7 7 … 4 7; … ; 2 4 … 9 6; 2 2 … 3 3], [2 4 … 0 2]), 10, 50000, true, 50000, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  49991, 49992, 49993, 49994, 49995, 49996, 49997, 49998, 49999, 50000], false)
julia> trainmode!(model,true)
Chain(Dense(10, 1, σ))
julia> loss(x, y) = Flux.mae(model(x), y)
loss (generic function with 1 method)
julia> opt = ADAM(0.5)
ADAM(0.5, (0.9, 0.999), IdDict{Any,Any}())
julia> ps = params(model)
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> ps
Params([Float32[0.53760564 0.18841723 … 0.22766067 0.6651586], Float32[0.0]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia> Flux.train!(loss, ps, train_loader, opt)
julia> ps
Params([Float32[3.540263 3.1911578 … 3.2303967 3.667896], Float32[3.002677]])
julia>

As you can see, ps is updated after the first epoch training, but it is not updated more.
Anyone have another idea?

PS: I suggest you to try to learn a real function (even a simple one) more than a random function.

I don’t know about your Flux issue unfortunately, but round works on individual numbers, so if you have an array of floats (as your error message suggests) and want to round them, you need to broadcast: round.(model(x)).

1 Like

its very good idea , but i removed all of those and main problem still exists , i was working late last night on it and suddenly got it working , but today got stuck again

exactly , maybe i should downgrade or something

i even went further inside the code and copied the train function from Flux and imported Zygote .
the problem is the gradient when goes down to single parameters it goes to nothing
so the loop just continues with no error
but it never updates the weights and biases

I think it is more a conceptual error than programming error. It is suppose to know about that, but my brain is lazy today :-).

A neuron can only give results between 0 and 1, and your target is between [1, 9] so it is not possible to optimize more, and the process stops. Try to normalize both the input as the output and the model will be working.

i have done the same experiment on lots of ML frameworks (TF,Torch,MX)
neuron can generate results larger than 1 based on the activation

But in Flux you are using the sigmoid function that avoid results larger than 1. I should have said ‘that neuron’, not ‘a neuron’. In TF, for instance, I think you were using another activation function (by default it is None).

1 Like

That’s true in general but in your case you’re using a sigmoid activation function which is constrained between 0 and 1.

1 Like

yes , i did not pay attention to sigmoid , i changed it to relu and also normalized inputs and outputs
still no update on the params

a strange thing, i removed the activation function from dense layer and now its working.
in TF we usually leave the last output layer without activation and use an out side activation function on model output .
how is it done in Flux?

I think it is done as the final parameter in the Chain:

Chain(Dense(10, 2), here_final_activation)

But I have not the computer now to check it.

1 Like