LSTM with batch vs with individual values

ForceBru · December 28, 2021, 7:28pm

Looks like there are two ways of calling models (running forecasting/regression/etc) in Flux: “batch mode” (model(data_matrix)) and “vector comprehension mode” ([model(elem)[1] for elem in data_vector]), where elem is one input vector.

These two ways of calling the model produce different results:

let model = Chain(LSTM(1, 3))
    data = Float32[1,2,3,4,5,6]
    
    model(data') |> display
    
    Flux.reset!(model)
    [model([d]) for d in data]
end

Output

3×6 Matrix{Float32}:
 -0.0297324  -0.0248715  -0.0134866  -0.00586918  -0.00224418  -0.0007936
  0.115042    0.23459     0.325777    0.387996     0.431214     0.463294
  0.0504465   0.0913864   0.111791    0.112188     0.0996351    0.0816775

6-element Vector{Vector{Float32}}:
 [-0.029732374, 0.11504154, 0.050446533]
 [-0.03805526, 0.32181326, 0.15570784]
 [-0.03267687, 0.50102127, 0.286734]
 [-0.02288559, 0.5983699, 0.4003932]
 [-0.0133526595, 0.64711636, 0.48314008]
 [-0.006731839, 0.6773652, 0.5399319]

The results are different, even though I called Flux.reset!(model) before computing the second output.

Question

Why does this happen? Is this because of broadcasting issues discussed in the docs?

Mapping and broadcasting operations with stateful layers such are discouraged, since the julia language doesn’t guarantee a specific execution order.

However, model(data') executes much faster and uses all CPU cores, while the vector comprehension version is really slow (1 epoch per second vs 10 epochs per second) and uses only one core. Which one should I use?

ToucheSir · December 28, 2021, 8:06pm

model(data') is feeding in a single timestep with 1 feature and batch size 6. [model([d]) for d in data] . Is feeding in 6 timesteps with 1 feature and batch size 1. If you want to want to feed multiple timesteps to an RNN simultaneously, pass in a 3D array of shape features x batch x timesteps. This is covered in Recurrence · Flux, but since we haven’t had a stable docs build in a while that may well be falling through the cracks.

visr · December 29, 2021, 5:06pm

Ah I didn’t know 3D features x batch x timesteps was also supported. It doesn’t seem to be covered on the recurrence page you linked, also on master. I believe you are referring to this addition: #1686? I just tried it, however one downside of this shape versus a vector of matrices batch x (feature x timesteps) seems to be that the DataLoaders expect the batch (observation) dimension to be the last one.

ToucheSir · January 3, 2022, 12:49am

It’s at the bottom of the page:

Suggestions/(preferably) PRs for a better place to put it very much welcome

visr · January 3, 2022, 10:37am

Yeah I think the recurrence docs are quite clear about the vector of (features, samples), and pretty good overall. I meant specifically that 3D array support is only briefly noted in the Recur docstring, which I had missed before. Though perhaps it’s too early to advertise this more given the work in https://github.com/FluxML/Flux.jl/issues/1678.

Topic		Replies	Views
Flux LSTM format of input train data General Usage question , data , flux	0	464	April 7, 2020
Batch training for LSTMs in Flux or Knet New to Julia question , knet , flux	5	948	October 18, 2020
How to do batching in Flux's recurrent sequence model to take advantage of GPU during training? Machine Learning flux	1	819	September 12, 2019
Some questions for LSTM General Usage question	6	517	July 29, 2022
Understanding `Flux.Data.DataLoader` when training an LSTM model Machine Learning flux , lstm , rnn	1	781	February 2, 2024

LSTM with batch vs with individual values

Output

Question

Related topics