I’m trying to write a model similar to the one in Image to markup with coarse to fine attention, where a fully convolutional network is first applied to an image (resulting in a smaller image with a greater number of channels, e.g., shape (80,80,256)), and then a bidirectional LSTM is applied to each row of the resulting image.
I don’t know how to correctly handle batches since this seems to require that I write some explicit iteration, accessing the rows in a non shape-independent way:
function (m::RowEncoder)(x)
out = zeros(m.rows, m.columns, m.features)
for i in 1:m.rows
out[i, :, :] = m.bidir_lstm(x[i, :, :])
Flux.reset!(m.bidir_lstm)
end
end
but this errors when called on a batch with shape (rows,colums,features,examples)
. Can I write a specific version of the function for each shape? Do I need to do this? Or is there a way to write a shape-generic function that also handles BiLSTM resets?
Thanks,
Daniel