Hi friends here,
I would like to ask a question about the Flux training, and I am try to understand why my model will return NaN value after a failed training step.
for example I have such a model, input shape of it is (120,56,1)
julia> myModel = Chain(
Conv((7, ), 56 => 32, stride = 3),
Flux.flatten,
Dense(38 * 32 => 10, identity),
BatchNorm(10, relu)
)
julia> x = rand(Float32,120,56,1)
here is the output of model:
julia> myModel(x)
10-element Vector{Float32}:
-0.0013220371
0.0051790997
0.042023115
-0.031484906
-0.037755977
....
Then I have an objective function but due to dimension mismatch the function will throw an error, as my input is (120,56,1) but the output is (10,1), from my understanding the model will not get train with such a loss function.
julia> loss(x) = logitbinarycrossentropy(myModel(x), x, agg=sum) # it will raise an error
So when I try to train this model it immediately throws an error, which is expected
# this will raise an error
julia> Flux.train!(loss, Flux.params(myModel), [x], Adam(0.0001))
However, if I rerun the model it will output all NaN values, which is confusing because I get an error in the training step and the training cannot continue, so why is it affecting my original model?
julia> myModel(x)
10-element Vector{Float32}:
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
Any comments are appreciated & Thank you for your attention,