How to correctly handle batches in this model

I’m trying to write a model similar to the one in Image to markup with coarse to fine attention, where a fully convolutional network is first applied to an image (resulting in a smaller image with a greater number of channels, e.g., shape (80,80,256)), and then a bidirectional LSTM is applied to each row of the resulting image.

I don’t know how to correctly handle batches since this seems to require that I write some explicit iteration, accessing the rows in a non shape-independent way:

function (m::RowEncoder)(x)
    out = zeros(m.rows, m.columns, m.features)
    for i in 1:m.rows
        out[i, :, :] = m.bidir_lstm(x[i, :, :])

but this errors when called on a batch with shape (rows,colums,features,examples). Can I write a specific version of the function for each shape? Do I need to do this? Or is there a way to write a shape-generic function that also handles BiLSTM resets?