Looks like there are two ways of calling models (running forecasting/regression/etc) in Flux: “batch mode” (model(data_matrix)
) and “vector comprehension mode” ([model(elem)[1] for elem in data_vector]
), where elem
is one input vector.
These two ways of calling the model produce different results:
let model = Chain(LSTM(1, 3))
data = Float32[1,2,3,4,5,6]
model(data') |> display
Flux.reset!(model)
[model([d]) for d in data]
end
Output
3×6 Matrix{Float32}:
-0.0297324 -0.0248715 -0.0134866 -0.00586918 -0.00224418 -0.0007936
0.115042 0.23459 0.325777 0.387996 0.431214 0.463294
0.0504465 0.0913864 0.111791 0.112188 0.0996351 0.0816775
6-element Vector{Vector{Float32}}:
[-0.029732374, 0.11504154, 0.050446533]
[-0.03805526, 0.32181326, 0.15570784]
[-0.03267687, 0.50102127, 0.286734]
[-0.02288559, 0.5983699, 0.4003932]
[-0.0133526595, 0.64711636, 0.48314008]
[-0.006731839, 0.6773652, 0.5399319]
The results are different, even though I called Flux.reset!(model)
before computing the second output.
Question
Why does this happen? Is this because of broadcasting issues discussed in the docs?
Mapping and broadcasting operations with stateful layers such are discouraged, since the julia language doesn’t guarantee a specific execution order.
However, model(data')
executes much faster and uses all CPU cores, while the vector comprehension version is really slow (1 epoch per second vs 10 epochs per second) and uses only one core. Which one should I use?