Flux training gives NaNs

I’m training a CNN on a vision problem and sometimes the model parameters become NaNs. I don’t know how to create an MWE, and it doesn’t happen every time.

The model itself is

		# input 96x96x1
		Conv((5,5), 1=>32, pad=(2,2), relu),  # now 96x96x32
		MaxPool((3,3),pad=(1,1),stride=2),  # now 48x48x32

		Conv((5,5), 32=>32, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 24x24x32

		Conv((5,5), 32=>64, pad=(2,2), relu),
		MeanPool((3,3),pad=(1,1),stride=2),  # now 12x12x64

		Conv((5,5), 64=>64, pad=(0,0), relu),  # now 8x8x64 
		MeanPool((2,2)),  # now 4x4x64

		Conv((4,4), 64=>128, pad=(0,0), relu),  # now 1x1x128 

		# Reshape tensor before dense layer
		x -> reshape(x, :, size(x, 4)),
		Dense(128, 5),


I’m using ADAM on cross-entropy loss. This in Julia 1.3.1 and Flux 0.10.1.

It runs on a CPU with about 70K gray images of size 96x96. Again, often the loss decreases fine, but then it will kick to a NaN in the weights of the first layer. Any ideas on what could be happening?

To answer myself, the problem was that crossentropy does not guard against taking the log of zero. So when a softmax result underflows to zero in the ground truth category, the result is NaN and you are SOL.

My fix is to broadcast a max with machine epsilon of the model result before calling on crossentropy.

1 Like

You can also use logitcrossentropy:

julia> Flux.crossentropy([0], [0])

julia> Flux.logitcrossentropy([0], [0])