Flux training gives NaNs

I’m training a CNN on a vision problem and sometimes the model parameters become NaNs. I don’t know how to create an MWE, and it doesn’t happen every time.

The model itself is

	Chain(
		# input 96x96x1
		Conv((5,5), 1=>32, pad=(2,2), relu),  # now 96x96x32
		MaxPool((3,3),pad=(1,1),stride=2),  # now 48x48x32

		Conv((5,5), 32=>32, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 24x24x32

		Conv((5,5), 32=>64, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 12x12x64

		Conv((5,5), 64=>64, pad=(0,0), relu),  # now 8x8x64 
		MeanPool((2,2)),  # now 4x4x64

		Conv((4,4), 64=>128, pad=(0,0), relu),  # now 1x1x128 

		# Reshape tensor before dense layer
		x -> reshape(x, :, size(x, 4)),
		Dense(128, 5),

		softmax
	)

I’m using ADAM on cross-entropy loss. This in Julia 1.3.1 and Flux 0.10.1.

It runs on a CPU with about 70K gray images of size 96x96. Again, often the loss decreases fine, but then it will kick to a NaN in the weights of the first layer. Any ideas on what could be happening?

To answer myself, the problem was that crossentropy does not guard against taking the log of zero. So when a softmax result underflows to zero in the ground truth category, the result is NaN and you are SOL.

My fix is to broadcast a max with machine epsilon of the model result before calling on crossentropy.

2 Likes

You can also use logitcrossentropy:

julia> Flux.crossentropy([0], [0])
NaN

julia> Flux.logitcrossentropy([0], [0])
-0.0
1 Like

I am running on almost same issue here, are you still using this fix? May I please ask you to elaborate on how you have implemented it?

I have a CNN that produces no NaN’s, with Flux.logitbinarycrossentropy() which, by the way, has no eps parameter as argument.

When the loss function is changed to, say, Flux.focal_loss() or Flux.dice_coeff(), then the NaN’s plague shows up. As opposite to the above loss function, both of them do have a default value for the eps argument. For instance:

focal_loss(ŷ, y; dims=1, agg=mean, gamma=2, eps=eps(eltype(ŷ)))

Anyway, have tried to play with its value, from 10e-7 to 1e0 at the actual model training procedure, no success. For instance:

julia> Flux.focal_loss([0], [0])
0.0

julia> Flux.focal_loss([0], [0], eps=1e-7)
0.0

julia> Flux.focal_loss([0], [0], eps=1e-7) == 1e-7
false

julia> Flux.focal_loss([0], [0], eps=1e-7) == 0
true

Any help is appreciated. Thanks.