Need help proofreading my BLSTM implementation

Qfl3x · December 13, 2021, 10:48am

Hi, I’m implementing a BLSTM for a personal (soon finished) project of mine and I’m currently very confused as to how BLSTM’s are implemented and found various kinds of implementations in papers online, mine should concatenate the output of the two passes into a single array for each word at a time. My current code reads:

struct BLSTM{A,B,C}
    forward  :: Recur{LSTMCell{A,B,C},C}
    backward :: Recur{LSTMCell{A,B,C},C}
    outdim   :: Int
end


function BLSTM(in::Int,out::Int)
    forward = LSTM(in,out)
    backward = LSTM(in,out)
    return BLSTM(forward,backward,out*2)
end

function (m::BLSTM)(x::AbstractArray)
    forward_out  = m.forward(x)
    backward_out = reverse(m.backward(reverse(x,dims=2)),dims=2)
    return cat(forward_out,backward_out,dims=1)
end
Flux.trainable(m::BLSTM) = (m.forward,m.backward)
@functor BLSTM

More specifically the double reverse. My thought process is that I reverse the input and push it onto the LSTM, the result of which is the other way around so I reverse it back.

ToucheSir · December 13, 2021, 11:34pm

Looks mostly good. The biggest tweak you’ll need is to change reverse(..., dims=2) to reverse(..., dims=3). Per Model Reference · Flux, the time dimension is 3rd/last when using a dense input.

On to more minor/stylistic feedback, you can change the struct definition to

struct Bidirectional{A<:Recur,B<:Recur}
    forward  :: A
    backward :: B
end
@functor Bidirectional

This allows it to be used with any RNN type. Note that overriding trainable is not necessary most of the time, because it falls back to calling functor which @functor implements for you.

Qfl3x · December 14, 2021, 12:03pm

Per Model Reference · Flux , the time dimension is 3rd/last when using a dense input.

However, aren’t I dealing directly with the stateful cells? LSTM gives a Recur{LSTMCell{...}}.

My input is of the form 300xN, with 300 being “time” (Length of a Vector). Running LSTM(30,10) on the Embedding(37,30) of an input array gives me an array 10×300×64. “Time” is second, and the third is the batch.

ToucheSir · December 14, 2021, 3:38pm

That docs section is also dealing with stateful cells (that’s what Recur is, a wrapper that makes cells and other functions stateful). Note that LSTM === Recur{LSTMCell} <: Recur.

Unfortunately we have no way of detecting intent and warning about this, but those input dimensions are backwards. Calling an RNN with an input of shape features×time×batch won’t error, but it will compute the wrong results. The only accepted format is features×batch×time (see note → ^[1]), so having the last 2 dimensions backwards means that network outputs will be completely wrong.

Thankfully, there’s an easy fix for this. If you transpose your input from 300xN to Nx300 before feeding it to the Embedding, everything will work correctly.

There is a good (but unfortunately not often talked about) reason for this. Since RNNs operate on one timestep at a time, we want to preserve memory locality within each timestep for the best performance. That means putting the time dimension first (for row-major arrays as in Python land) or last (for col-major arrays in Julia). It is possible to slap the time dimension in the middle and we may support that in the future, but there’s a good chance doing so will perform noticeably worse. ↩︎

Qfl3x · December 14, 2021, 5:55pm

But then how am I suppose to feed it to DataLoader? Maybe transpose the inputs given by DataLoader?

for (x,y) in train_loader
    x = x'
    y = y'
    ...
end

Like this?

Topic		Replies	Views
How to build a bidirectional RNN with Flux? Machine Learning question , flux , machine-learning	1	1385	December 10, 2019
LSTM training for a sequence of multiple features using a batch size 30 Machine Learning lstm	10	2473	November 22, 2023
Flux - LSTM - Issue with input format for multiple features Machine Learning flux , machine-learning	9	1854	November 1, 2022
Some questions for LSTM General Usage question	6	518	July 29, 2022
Dimensions of minibatch Machine Learning	3	1061	August 7, 2020

Need help proofreading my BLSTM implementation

Related topics