DomainError with Loss is Inf on data item 1, stopping training:

chadagreene · August 29, 2023, 7:37pm

I’m trying to learn Julia and ML principles simultaneously. I’ve successfully trained a model to fit a simple straight line, following the procedure described in the intro docs, but when I try to go a step further and add a little bit of random noise to the training data, I try to iteratively train the model:

for epoch in 1:200
   train!(loss, predict, data, opt)
end

which immediately throws this error:

DomainError with Loss is Inf on data item 1, stopping training:

After just one iteration, it appears that Flux came up with a bias of -4.8589914f16, which is a very big number, and my best interpretation is that the second iteration sends the bias and weight values to infinity.

How should I troubleshoot this error?

skleinbo · August 30, 2023, 11:22am

A full code example would be helpful. What kind of noise do you add?

When I add some Gaussian noise to the training data by modifying
actual(x) = 4x + 2 + randn()
it still trains well.

Take a look at the loss and gradients during training by writing the train! method essentially by hand, e.g.

predict = Dense(1 => 1)

opt = Flux.setup(Descent(), predict)
for epoch in 1:200
   l,gs = Flux.withgradient(m->loss(m, x_train, y_train), predict)
   @show l, gs

   Flux.Optimise.update!(opt, predict, gs[1])
end

(l, gs) = (7.3990827f-9, ((weight = Float32[-0.00052022934;;], bias = Float32[-0.00014666717], σ = nothing),))
(l, gs) = (7.128392f-9, ((weight = Float32[0.0005106926;;], bias = Float32[0.00014368694], σ = nothing),))
(l, gs) = (6.862917f-9, ((weight = Float32[-0.0005009969;;], bias = Float32[-0.00014134249], σ = nothing),))
[...]

Topic		Replies	Views
Params not getting updated during training New to Julia flux	25	1731	October 11, 2020
Flux error: Loss is NaN New to Julia flux	5	978	August 4, 2019
Problem with Training and trying to plot loss function in Flux New to Julia	5	790	August 17, 2022
Getting NaNs in the hello world example of Flux Machine Learning question	2	744	October 28, 2021
Problems using Flux New to Julia	7	438	June 6, 2023

DomainError with Loss is Inf on data item 1, stopping training:

Related topics