I’m trying to write a model similar to the one in Image to markup with coarse to fine attention, where a fully convolutional network is first applied to an image (resulting in a smaller image with a greater number of channels, e.g., shape (80,80,256)), and then a bidirectional LSTM is applied to each row of the resulting image.
I don’t know how to correctly handle batches since this seems to require that I write some explicit iteration, accessing the rows in a non shape-independent way:
function (m::RowEncoder)(x) out = zeros(m.rows, m.columns, m.features) for i in 1:m.rows out[i, :, :] = m.bidir_lstm(x[i, :, :]) Flux.reset!(m.bidir_lstm) end end
but this errors when called on a batch with shape
(rows,colums,features,examples). Can I write a specific version of the function for each shape? Do I need to do this? Or is there a way to write a shape-generic function that also handles BiLSTM resets?