Hello, I’m trying to figure out what’s wrong with my Flux LSTM model, which is a chain of LSTM, Dense, and softmax. The problem that I’m seeing is that the Dense grads are fine, but I get NaNs for all the LSTM gradients, so the gradients are not propagating back. I don’t get this problem if I replace LSTM with RNN. I’m new to both LSTMs and Flux.jl. I find it strange because LSTM is directly connected to Dense, which is fine.
Here’s the code to reproduce the issue:
using Flux using BSON: @load @load "lstmdata.bson" X Y m = Chain(LSTM(15, 30), Dense(30, 2), softmax) function loss(X, Y) l = Flux.crossentropy(m.(X)[end],Y[end]) Flux.truncate!(m) return l end l=loss(X,Y) Flux.back!(l) W=params(m) W.grad #fine W.grad #NaNs
The data BSON can be downloaded here: lstmdata.bson. Any help would be greatly appreciated. Thanks!