# Need help proofreading my BLSTM implementation

Hi, I’m implementing a BLSTM for a personal (soon finished) project of mine and I’m currently very confused as to how BLSTM’s are implemented and found various kinds of implementations in papers online, mine should concatenate the output of the two passes into a single array for each word at a time. My current code reads:

``````struct BLSTM{A,B,C}
forward  :: Recur{LSTMCell{A,B,C},C}
backward :: Recur{LSTMCell{A,B,C},C}
outdim   :: Int
end

function BLSTM(in::Int,out::Int)
forward = LSTM(in,out)
backward = LSTM(in,out)
return BLSTM(forward,backward,out*2)
end

function (m::BLSTM)(x::AbstractArray)
forward_out  = m.forward(x)
backward_out = reverse(m.backward(reverse(x,dims=2)),dims=2)
return cat(forward_out,backward_out,dims=1)
end
Flux.trainable(m::BLSTM) = (m.forward,m.backward)
@functor BLSTM
``````

More specifically the double `reverse`. My thought process is that I reverse the input and push it onto the LSTM, the result of which is the other way around so I `reverse` it back.

Looks mostly good. The biggest tweak you’ll need is to change `reverse(..., dims=2)` to `reverse(..., dims=3)`. Per Model Reference · Flux, the time dimension is 3rd/last when using a dense input.

On to more minor/stylistic feedback, you can change the struct definition to

``````struct Bidirectional{A<:Recur,B<:Recur}
forward  :: A
backward :: B
end
@functor Bidirectional
``````

This allows it to be used with any RNN type. Note that overriding `trainable` is not necessary most of the time, because it falls back to calling `functor` which `@functor` implements for you.

Per Model Reference · Flux , the time dimension is 3rd/last when using a dense input.

However, aren’t I dealing directly with the stateful cells? `LSTM` gives a `Recur{LSTMCell{...}}`.

My input is of the form 300xN, with 300 being “time” (Length of a Vector). Running `LSTM(30,10)` on the `Embedding(37,30)` of an input array gives me an array `10×300×64`. “Time” is second, and the third is the batch.

That docs section is also dealing with stateful cells (that’s what `Recur` is, a wrapper that makes cells and other functions stateful). Note that `LSTM === Recur{LSTMCell} <: Recur`.

Unfortunately we have no way of detecting intent and warning about this, but those input dimensions are backwards. Calling an RNN with an input of shape `features×time×batch` won’t error, but it will compute the wrong results. The only accepted format is `features×batch×time` (see note → [1]), so having the last 2 dimensions backwards means that network outputs will be completely wrong.

Thankfully, there’s an easy fix for this. If you transpose your input from `300xN` to `Nx300` before feeding it to the `Embedding`, everything will work correctly.

1. There is a good (but unfortunately not often talked about) reason for this. Since RNNs operate on one timestep at a time, we want to preserve memory locality within each timestep for the best performance. That means putting the time dimension first (for row-major arrays as in Python land) or last (for col-major arrays in Julia). It is possible to slap the time dimension in the middle and we may support that in the future, but there’s a good chance doing so will perform noticeably worse. ↩︎

2 Likes

But then how am I suppose to feed it to `DataLoader`? Maybe transpose the inputs given by `DataLoader`?

``````for (x,y) in train_loader
x = x'
y = y'
...
end
``````

Like this?

1 Like