Flux training gives NaNs

tobydriscoll · February 10, 2020, 4:43pm

I’m training a CNN on a vision problem and sometimes the model parameters become NaNs. I don’t know how to create an MWE, and it doesn’t happen every time.

The model itself is

	Chain(
		# input 96x96x1
		Conv((5,5), 1=>32, pad=(2,2), relu),  # now 96x96x32
		MaxPool((3,3),pad=(1,1),stride=2),  # now 48x48x32

		Conv((5,5), 32=>32, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 24x24x32

		Conv((5,5), 32=>64, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 12x12x64

		Conv((5,5), 64=>64, pad=(0,0), relu),  # now 8x8x64 
		MeanPool((2,2)),  # now 4x4x64

		Conv((4,4), 64=>128, pad=(0,0), relu),  # now 1x1x128 

		# Reshape tensor before dense layer
		x -> reshape(x, :, size(x, 4)),
		Dense(128, 5),

		softmax
	)

I’m using ADAM on cross-entropy loss. This in Julia 1.3.1 and Flux 0.10.1.

It runs on a CPU with about 70K gray images of size 96x96. Again, often the loss decreases fine, but then it will kick to a NaN in the weights of the first layer. Any ideas on what could be happening?

tobydriscoll · February 11, 2020, 2:10pm

To answer myself, the problem was that crossentropy does not guard against taking the log of zero. So when a softmax result underflows to zero in the ground truth category, the result is NaN and you are SOL.

My fix is to broadcast a max with machine epsilon of the model result before calling on crossentropy.

DrChainsaw · February 14, 2020, 8:11pm

You can also use logitcrossentropy:

julia> Flux.crossentropy([0], [0])
NaN

julia> Flux.logitcrossentropy([0], [0])
-0.0

cirobr · July 3, 2023, 6:40pm

I am running on almost same issue here, are you still using this fix? May I please ask you to elaborate on how you have implemented it?

I have a CNN that produces no NaN’s, with Flux.logitbinarycrossentropy() which, by the way, has no eps parameter as argument.

When the loss function is changed to, say, Flux.focal_loss() or Flux.dice_coeff(), then the NaN’s plague shows up. As opposite to the above loss function, both of them do have a default value for the eps argument. For instance:

focal_loss(ŷ, y; dims=1, agg=mean, gamma=2, eps=eps(eltype(ŷ)))

Anyway, have tried to play with its value, from 10e-7 to 1e0 at the actual model training procedure, no success. For instance:

julia> Flux.focal_loss([0], [0])
0.0

julia> Flux.focal_loss([0], [0], eps=1e-7)
0.0

julia> Flux.focal_loss([0], [0], eps=1e-7) == 1e-7
false

julia> Flux.focal_loss([0], [0], eps=1e-7) == 0
true

Any help is appreciated. Thanks.

Topic		Replies	Views
Why does my Flux model return in all NaN? Machine Learning question , flux	2	743	October 9, 2023
How come Flux.jl's network parameters go to NaN? Machine Learning first-steps , flux	10	4067	June 9, 2021
Flux error: Loss is NaN New to Julia flux	5	980	August 4, 2019
Getting NaNs in the hello world example of Flux Machine Learning question	2	744	October 28, 2021
Flux.jl vanilla ANN loss goes to NaN with mini batch Machine Learning question	0	1542	June 21, 2019

Flux training gives NaNs

Related topics