LSTM with batch vs with individual values

Looks like there are two ways of calling models (running forecasting/regression/etc) in Flux: “batch mode” (model(data_matrix)) and “vector comprehension mode” ([model(elem)[1] for elem in data_vector]), where elem is one input vector.

These two ways of calling the model produce different results:

let model = Chain(LSTM(1, 3))
    data = Float32[1,2,3,4,5,6]
    
    model(data') |> display
    
    Flux.reset!(model)
    [model([d]) for d in data]
end

Output

3×6 Matrix{Float32}:
 -0.0297324  -0.0248715  -0.0134866  -0.00586918  -0.00224418  -0.0007936
  0.115042    0.23459     0.325777    0.387996     0.431214     0.463294
  0.0504465   0.0913864   0.111791    0.112188     0.0996351    0.0816775

6-element Vector{Vector{Float32}}:
 [-0.029732374, 0.11504154, 0.050446533]
 [-0.03805526, 0.32181326, 0.15570784]
 [-0.03267687, 0.50102127, 0.286734]
 [-0.02288559, 0.5983699, 0.4003932]
 [-0.0133526595, 0.64711636, 0.48314008]
 [-0.006731839, 0.6773652, 0.5399319]

The results are different, even though I called Flux.reset!(model) before computing the second output.

Question

Why does this happen? Is this because of broadcasting issues discussed in the docs?

Mapping and broadcasting operations with stateful layers such are discouraged, since the julia language doesn’t guarantee a specific execution order.

However, model(data') executes much faster and uses all CPU cores, while the vector comprehension version is really slow (1 epoch per second vs 10 epochs per second) and uses only one core. Which one should I use?

model(data') is feeding in a single timestep with 1 feature and batch size 6. [model([d]) for d in data] . Is feeding in 6 timesteps with 1 feature and batch size 1. If you want to want to feed multiple timesteps to an RNN simultaneously, pass in a 3D array of shape features x batch x timesteps. This is covered in Recurrence · Flux, but since we haven’t had a stable docs build in a while that may well be falling through the cracks.

3 Likes

Ah I didn’t know 3D features x batch x timesteps was also supported. It doesn’t seem to be covered on the recurrence page you linked, also on master. I believe you are referring to this addition: #1686? I just tried it, however one downside of this shape versus a vector of matrices batch x (feature x timesteps) seems to be that the DataLoaders expect the batch (observation) dimension to be the last one.

It’s at the bottom of the page:

Suggestions/(preferably) PRs for a better place to put it very much welcome :slight_smile:

Yeah I think the recurrence docs are quite clear about the vector of (features, samples), and pretty good overall. I meant specifically that 3D array support is only briefly noted in the Recur docstring, which I had missed before. Though perhaps it’s too early to advertise this more given the work in Recurrent network interface updates/design · Issue #1678 · FluxML/Flux.jl · GitHub.