Issues with recurrent layers in Flux.jl

I am using Julia 1.6 and Flux v0.12.6

I am having issues using the recurrent layers LSTM and RNN in Flux.jl and it does not make sense why this is happening. hopefully someone can see where it is going wrong?

I define the model as:

julia> my_model = LSTM(1,1)
Recur(
  LSTMCell(1, 1),                       # 14 parameters
)         # Total: 5 trainable arrays, 14 parameters,
          # plus 2 non-trainable, 2 parameters, summarysize 384 bytes.

I test it with

julia> eval = my_model.([1,2,3])

And it returns the error

ERROR: MethodError: no method matching (::Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}})(::Tuple{Matrix{Float32}, Matrix{Float32}}, ::Int64)
Closest candidates are:
  (::Flux.LSTMCell{A, V, var"#s318"} where var"#s318"<:Tuple{AbstractMatrix{T}, AbstractMatrix{T}})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {A, V, T} at /home/shindler/.julia/packages/Flux/Zz9RI/src/layers/recurrent.jl:137
Stacktrace:
 [1] (::Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}})(x::Int64)
   @ Flux ~/.julia/packages/Flux/Zz9RI/src/layers/recurrent.jl:34
 [2] _broadcast_getindex_evalf
   @ ./broadcast.jl:648 [inlined]
 [3] _broadcast_getindex
   @ ./broadcast.jl:621 [inlined]
 [4] getindex
   @ ./broadcast.jl:575 [inlined]
 [5] copy
   @ ./broadcast.jl:922 [inlined]
 [6] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Vector{Int64}}})
   @ Base.Broadcast ./broadcast.jl:883
 [7] top-level scope
   @ REPL[31]:1
 [8] top-level scope
   @ ~/.julia/packages/CUDA/9T5Sq/src/initialization.jl:66

I get the same issue when I run the model without the .

julia> my_model = LSTM(3,1)
Recur(LSTMCell(3, 1))

julia> my_model([3,2,1])
ERROR: MethodError: no method matching (::Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}})(::Tuple{Matrix{Float32}, Matrix{Float32}}, ::Vector{Int64})
Closest candidates are:
  (::Flux.LSTMCell{A, V, var"#s268"} where var"#s268"<:Tuple{AbstractMatrix{T}, AbstractMatrix{T}})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {A, V, T} at /home/shindler/.julia/packages/Flux/0c9kI/src/layers/recurrent.jl:137
Stacktrace:
 [1] (::Flux.Recur{Flux.LSTMCell{Matrix{Float32}, Vector{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}}}, Tuple{Matrix{Float32}, Matrix{Float32}}})(x::Vector{Int64})
   @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/recurrent.jl:34
 [2] top-level scope
   @ REPL[40]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/fRSUT/src/initialization.jl:52

I have the issue with a RNN as well

julia> my_model = RNN(3,1)
Recur(RNNCell(3, 1, tanh))

julia> my_model([3,2,1])
ERROR: MethodError: no method matching (::Flux.RNNCell{typeof(tanh), Matrix{Float32}, Vector{Float32}, Matrix{Float32}})(::Matrix{Float32}, ::Vector{Int64})
Closest candidates are:
  (::Flux.RNNCell{F, A, V, var"#s269"} where var"#s269"<:AbstractMatrix{T})(::Any, ::Union{AbstractVector{T}, AbstractMatrix{T}, Flux.OneHotArray}) where {F, A, V, T} at /home/shindler/.julia/packages/Flux/0c9kI/src/layers/recurrent.jl:83
Stacktrace:
 [1] (::Flux.Recur{Flux.RNNCell{typeof(tanh), Matrix{Float32}, Vector{Float32}, Matrix{Float32}}, Matrix{Float32}})(x::Vector{Int64})
   @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/recurrent.jl:34
 [2] top-level scope
   @ REPL[42]:1
 [3] top-level scope
   @ ~/.julia/packages/CUDA/fRSUT/src/initialization.jl:52

But a Dense layer seems to work fine

julia> my_model = Dense(3,1)
Dense(3, 1)

julia> my_model([3,2,1])
1-element Vector{Float32}:
 -1.0215458

[3,2,1] is a Vector{Int64}, and the basic Flux layers are only guaranteed to work with floating point types. That you can pass an int array into Dense is a happy coincidence, and even then not guaranteed to work in perpetuity. Try Float32[3,2,1] or allocating a float array by any other means instead, and the RNNs should work as expected.

As for this:

This is probably not doing what you want or expect of it. Broadcasting over an array applies the broadcasted function element-wise, so this is equivalent to [my_model(1), my_model(2), my_model(3)]. See Recurrence · Flux for a guide on how to properly use sequences with Flux RNNs.

2 Likes

Thank you,

Using Float32 indeed solves all my problems.

Extra thanks for pointing out the broadcasts issue. I somehow missed that. (Need to be more careful about trusting random internet blogs).

1 Like

Do you mind posting the blog post in question? It might be worth someone suggesting a change to add a disclaimer or to help the author rewrite the post.

1 Like

The specific post I referred to is:

Now to actually evaluate the model on a sequence of inputs, you have to call it like this:

julia> simple_rnn.([1, 2, 3])

Notice the dot notation here - since our RNN only takes one input at a time, we need to apply the RNN on the sequence of inputs we provide one at a time. Then, if we take the last element of the output (after it has seen the entire sequence) we expect to see the sum of all the inputs.

But I have seen broadcasts used fairly frequently in the context of time series NNs in Julia.

Most notably: model-zoo/char-rnn.jl at master · FluxML/model-zoo · GitHub

function loss(xs, ys)
      l = sum(logitcrossentropy.(m.(xs), ys))
      return l
end
2 Likes

But I think in both those cases it is actually warranted to use broadcasting.

In the first case they mention that we want to apply the RNN on one input at a time since the RNN only take one input at a time, but updates its internal state each time so you do want to call it on each individual input in order. This is what would happen when you broadcast.

In the second case it seems the data are sequences of one-hot encodings of characters, so the RNN should be run on the one-hot vector of each individual character in a batch which I believe is what they achieve with the broadcast (haven’t tried the code myself but that is my understanding of the code).

Obviously, I have a limited understanding of this :upside_down_face:. But based on the documentation, the issue is not processing one input at a time, rather the use of broadcasts to do it.

The documentation for recurrence warns:

Mapping and broadcasting operations with stateful layers such are discouraged, since the julia language doesn’t guarantee a specific execution order. Therefore, avoid

y = m.(x)
# or 
y = map(m, x)

and use explicit loops

y = [m(x) for x in x]

So my understanding is that if the order that the data is processed matters then broadcasting shouldn’t be used. :man_shrugging:

3 Likes

You are correct, and that makes sense that a loop guarantees the order while a broadcast does not.

Didn’t read everything properly and thought the mixup was between the RNN model and your Dense example for some reason. Thought your gripe with the examples in question was that it didn’t work for you dense model. Next time I’ll try to read the text before trying to answer :sweat_smile:

1 Like

In fairness, many examples (including, as has been pointed out, the model zoo) used to use broadcasting. However, we ran into a number of issues where broadcasting wasn’t strictly running in left-right order. The reason the docs have been updated but the model zoo hasn’t is simply because we don’t have enough time on our hands as a almost 100% volunteer crew :sweat_smile:. PRs would be very welcome!