I’m training a CNN on a vision problem and sometimes the model parameters become NaNs. I don’t know how to create an MWE, and it doesn’t happen every time.
The model itself is
Chain(
# input 96x96x1
Conv((5,5), 1=>32, pad=(2,2), relu), # now 96x96x32
MaxPool((3,3),pad=(1,1),stride=2), # now 48x48x32
Conv((5,5), 32=>32, pad=(2,2), relu),
MeanPool((3,3),pad=(1,1),stride=2), # now 24x24x32
Conv((5,5), 32=>64, pad=(2,2), relu),
MeanPool((3,3),pad=(1,1),stride=2), # now 12x12x64
Conv((5,5), 64=>64, pad=(0,0), relu), # now 8x8x64
MeanPool((2,2)), # now 4x4x64
Conv((4,4), 64=>128, pad=(0,0), relu), # now 1x1x128
# Reshape tensor before dense layer
x -> reshape(x, :, size(x, 4)),
Dense(128, 5),
softmax
)
I’m using ADAM on cross-entropy loss. This in Julia 1.3.1 and Flux 0.10.1.
It runs on a CPU with about 70K gray images of size 96x96. Again, often the loss decreases fine, but then it will kick to a NaN in the weights of the first layer. Any ideas on what could be happening?