Help with LSTM on Flux.jl

rcnlee · May 18, 2018, 5:36am

Hello, I’m trying to figure out what’s wrong with my Flux LSTM model, which is a chain of LSTM, Dense, and softmax. The problem that I’m seeing is that the Dense grads are fine, but I get NaNs for all the LSTM gradients, so the gradients are not propagating back. I don’t get this problem if I replace LSTM with RNN. I’m new to both LSTMs and Flux.jl. I find it strange because LSTM is directly connected to Dense, which is fine.

Here’s the code to reproduce the issue:

using Flux
using BSON: @load
@load "lstmdata.bson" X Y
m = Chain(LSTM(15, 30), Dense(30, 2), softmax)
function loss(X, Y)
    l = Flux.crossentropy(m.(X)[end],Y[end])
    Flux.truncate!(m)
    return l
end
l=loss(X[1],Y[1])
Flux.back!(l)
W=params(m)
W[6].grad  #fine
W[5].grad  #NaNs

The data BSON can be downloaded here: lstmdata.bson. Any help would be greatly appreciated. Thanks!

Topic		Replies	Views
Issue with LSTM with Flux Performance flux , time-series , machine-learning , lstm	0	463	December 23, 2021
Unexpected behaviour with Flux Machine Learning flux	0	219	July 12, 2023
Debugging Flux NaN problem New to Julia flux	0	403	June 17, 2020
NaN errors in Flux General Usage flux	4	600	April 27, 2022
Problem with LSTM and GRU Layers in Flux New to Julia flux , machine-learning	9	678	February 14, 2024

Help with LSTM on Flux.jl

Related topics