Is there an implementation of the attention mechanism in Flux.jl?

Specifically, I’m interested in the attention mechanism described in Listen, Attend and Spell work (where it’s referred to as AttentionContext), but more generally any other kind of attention’s implementation would be quite useful to take a look at.

In terms of attention, the Transformer is probably interesting to check, though there are probably simpler attention models in Flux around.

https://github.com/chengchingwen/Transformers.jl

4 Likes

I’ve got a regular old seq2seq implementation here.
The problem is that it doesn’t work very well… So it might be risky to use it as a reference.
If you do happen to spot a mistake I’d be happy to hear about it!

1 Like

I’m digging through your code right now. Thank you for the work you put into this.

The only problem I’ve found so far is a typo on the ipynb: “Esentially, the encoder outputs and the hidden state of the decoder are used to a context vector”

Some word is missing between “used to” and “a” and I don’t know what it is.

Hey @MacKenzieHnC
Thanks for taking a look!
The sentence should have been this, I think:

Esentially, the encoder outputs and the hidden state of the decoder are used to create a context vector

It’s been some time since I’ve worked on this, the Flux api has changed quite a lot (most notably the AD engine changed to Zygote).
It definitely would be interesting to see this model implemented in the up-to-date version of Flux.
I might do this when I get the time… but I’m notoriously slow.

Maybe this links are interesting for you:

2 Likes