Specifically, I’m interested in the attention mechanism described in Listen, Attend and Spell work (where it’s referred to as AttentionContext), but more generally any other kind of attention’s implementation would be quite useful to take a look at.
In terms of attention, the Transformer is probably interesting to check, though there are probably simpler attention models in Flux around.
I’ve got a regular old seq2seq implementation here.
The problem is that it doesn’t work very well… So it might be risky to use it as a reference.
If you do happen to spot a mistake I’d be happy to hear about it!