Is it possible to use batching in Flux for sequences of different lengths?

I have sequences that are of the dimensions 8 by N, where N is a number fluctuating between 1 and 400. I would like to be able to do batch processing on these sequences, but have not had any luck in doing so inside of Flux. Is it possible to do batch operations in Flux on sequences of varying lengths?

Not sure what you are after, but iirc the way other frameworks which use 3D inputs to RNNs handle this is by padding the input and then masking the output.

I think something like this would be the equivalent in Flux:

julia> rnn = Flux.RNN(4,2);

julia> x0 = randn(Float32, 4, 3);

julia> rnn(x0)
2×3 Array{Float32,2}:
 0.784038   0.424429  0.904399
 0.818972  -0.459738  0.78797

julia> x1 = hcat(randn(Float32, 4, 2), zeros(Float32, 4, 1));

julia> mask = [1 1 0]
1×3 Array{Int64,2}:
 1  1  0

julia> rnn(x1) .* mask
2×3 Array{Float32,2}:
  0.981415   0.165332  0.0
 -0.737131  -0.215283  0.0

I guess it might also be possible to just remove the missing data from the state:

julia> x2 = randn(Float32, 4, 2);

julia> rnn.state = rnn.state[:,1:2]
2×2 Array{Float32,2}:
  0.981415   0.165332
 -0.737131  -0.215283

julia> rnn(x2)
2×2 Array{Float32,2}:
 -0.773747  0.974077
 -0.849829  0.975408

Depending on what your whole model looks like, this may be a bit cumbersome to do…

Thanks! After asking the question I actually thought padding might be a way to go, but I hadn’t thought of how to implement it yet - I am still new to Julia and have a lot to learn in terms of optimization, so I really appreciate the example.

Also new to Flux, but from my understanding, the management of variable sequence length is handled naturally by feeding the RNN cell with data shaped as a vector of size = sequence_length and whose elements are of size (features, batch_size). Therefore, handling of various sequence length shouldn’t require any padding trick, but just to broadcast the RNN over a vector of varying sequence length.

In the example below, the toy data has batch_size = 2, and both samples have identical data.
Then, a sequence of length 4 is built, again by duplicating the input features.

It can be observed that applying model m to the single (3,2) input, the RNN produces identical outputs for each of the two observations. That would not be the case if the (3,2) input was instead referring to a sequence of 2 of a single observation.

Broadcasting m onto the vector that repeats the (3,2) input 4 times, we get as an output a vector of length 4 containing (3,2) elements. It can be seen that the first (3,2) element matches with m(x1). Also, The second (3,2) matches the output of m(x1) if m(x1) is applied twice, which is coherent with the Flux recurrent cell taking one sequence element at a time.

The docs also illustrates this here: Model Reference · Flux

using Flux
using Random: seed!
seed!(1234);
m = RNN(3, 3)
# 3 features X 2 samples
seed!(1234);
x1 = rand(3,1)
x1 = cat(x1,x1, dims=2)
m1 = m(x1)
println(m1)

[-0.8223893847165972 -0.8223893847165972; 0.7631164681754794 0.7631164681754794; -0.49397430034214257 -0.49397430034214257]

m1 = m(x1)
println(m1)

[-0.9619506549502201 -0.9619506549502201; 0.9836595589647671 0.9836595589647671; 0.09047523145543104 0.09047523145543104]

# Now broadcasting over a sequence
seed!(1234);
m = RNN(3, 3)
# 3 features X 2 samples X 4 timesteps
x2 = [x1 for i in 1:4]
m2 = m.(x2)
println(m2[1])

[-0.8223893847165972 -0.8223893847165972; 0.7631164681754794 0.7631164681754794; -0.49397430034214257 -0.49397430034214257]

println(m2[2])

[-0.9619506549502201 -0.9619506549502201; 0.9836595589647671 0.9836595589647671; 0.09047523145543104 0.09047523145543104]

CUDNN supports a fused LSTM where the full timesteps are treated as a block with a 3D array as input, but I’m not aware if this approach is accessible in Flux.

1 Like