Lux recurrent networks like LSTM - why are hidden state and memory not part of the model state `st`?

schlichtanders · February 28, 2024, 7:31am

Lux.jl is special in that it has the extra st state variable which it distinguishes from the output and the other fixed parameters ps.

Hence I am currently confused about the implementation of LSTM

github.com

LuxDL/Lux.jl/blob/5a873d49b912e11f602189d322b49a468092a75c/src/layers/recurrent.jl#L392-L405


      
          function initialstates(rng::AbstractRNG, ::LSTMCell)
              # FIXME(@avik-pal): Take PRNGs seriously
              randn(rng, 1)
              return (rng=replicate(rng),)
          end
          
          function (lstm::LSTMCell{use_bias, false, false})(
                  x::AbstractMatrix, ps, st::NamedTuple) where {use_bias}
              rng = replicate(st.rng)
              @set! st.rng = rng
              hidden_state = _init_hidden_state(rng, lstm, x)
              memory = _init_hidden_state(rng, lstm, x)
              return lstm((x, (hidden_state, memory)), ps, st)
          end

function initialstates(rng::AbstractRNG, ::LSTMCell)
    # FIXME(@avik-pal): Take PRNGs seriously
    randn(rng, 1)
    return (rng=replicate(rng),)
end

function (lstm::LSTMCell{use_bias, false, false})(
        x::AbstractMatrix, ps, st::NamedTuple) where {use_bias}
    rng = replicate(st.rng)
    @set! st.rng = rng
    hidden_state = _init_hidden_state(rng, lstm, x)
    memory = _init_hidden_state(rng, lstm, x)
    return lstm((x, (hidden_state, memory)), ps, st)
end

Why aren’t hidden_state and memory part of st?

As they are not, why is there actually a st parameter? Wouldn’t it simplify the interface if it is also passed as an input argument like (hidden_state, memory) here?

It would be great if someone can explain this design clash which I feel here.

avikpal · February 29, 2024, 12:04am

Placing hidden_state and memory inside st makes the dispatch clunky. Currently, the dispatch is on the type of x, which is easy to understand (and maintain).
The implementation that you are pointing to is of a *Cell, which is distinct from an RNN. For eg, LSTM, in this case, is Recurrence(AbstractRecurrentCell(....)) and if you see the implementation for that it actually hides all of the memory and state part from the end user.
Gradients do propagate through the hidden_state and memory. You can place these in st but typically st is used for non-trainable and things that don’t propagate gradients (though the interface doesn’t enforce the latter)

schlichtanders · March 4, 2024, 10:54am

Thank you. A follow up question:
next to random number generator, what are other typical usages of st ?

avikpal · March 4, 2024, 9:54pm

train/test mode flags
statistics tracking – Normalization Layers
Passing around complete solutions – often needed for DEQs, NeuralODEs

Topic		Replies	Views
Lux.jl QuickStart: why `Lux.apply` returns the state? Machine Learning neural-network , lux	1	106	February 6, 2025
RNNs in LUX General Usage question , neural-network , rnn , lux	2	625	October 19, 2023
Issue understanding Lux recurrent cells Machine Learning tensorflow , flux , rnn , lux	1	244	May 14, 2024
Resetting some states in a batch in Lux.jl recurrent layers Machine Learning question , lux	0	62	February 27, 2025
Lux + Turing: How to not to use a global variable for state `st` General Usage question , turing , lux	1	232	February 29, 2024

Lux recurrent networks like LSTM - why are hidden state and memory not part of the model state `st`?

Related topics