DomainError with Loss is Inf on data item 1, stopping training:

I’m trying to learn Julia and ML principles simultaneously. I’ve successfully trained a model to fit a simple straight line, following the procedure described in the intro docs, but when I try to go a step further and add a little bit of random noise to the training data, I try to iteratively train the model:

for epoch in 1:200
   train!(loss, predict, data, opt)
end

which immediately throws this error:

DomainError with Loss is Inf on data item 1, stopping training:

After just one iteration, it appears that Flux came up with a bias of -4.8589914f16, which is a very big number, and my best interpretation is that the second iteration sends the bias and weight values to infinity.

How should I troubleshoot this error?

A full code example would be helpful. What kind of noise do you add?

When I add some Gaussian noise to the training data by modifying
actual(x) = 4x + 2 + randn()
it still trains well.

Take a look at the loss and gradients during training by writing the train! method essentially by hand, e.g.

predict = Dense(1 => 1)

opt = Flux.setup(Descent(), predict)
for epoch in 1:200
   l,gs = Flux.withgradient(m->loss(m, x_train, y_train), predict)
   @show l, gs

   Flux.Optimise.update!(opt, predict, gs[1])
end
(l, gs) = (7.3990827f-9, ((weight = Float32[-0.00052022934;;], bias = Float32[-0.00014666717], σ = nothing),))
(l, gs) = (7.128392f-9, ((weight = Float32[0.0005106926;;], bias = Float32[0.00014368694], σ = nothing),))
(l, gs) = (6.862917f-9, ((weight = Float32[-0.0005009969;;], bias = Float32[-0.00014134249], σ = nothing),))
[...]
1 Like