Flux seq2seq



I don’t know whether this is the right place to ask but I’m trying to code a seq2seq model in Flux and I’ve got a couple of questions.

The encoder creates the hidden state for the decoder, how can I pass this state to the GRU in the decoder? Is this possible without a custom-built layers?

Secondly, how can I use this model on the gpu? It seems like |> gpu doesn’t work on line 9 because of the indexing?
Also result is of undefined size, so this can’t be ran on my gpu?


model = function(seq, voc_size, max_length)
    seq = onehotbatch(seq, dictionary_fr[:, 1])
    seq = emb_layer_fr*seq
    #split seq in it's columns:
    seq = [seq[:, i] for i in 1:size(seq)[2]]
    x = GRU(300, 256).(seq)[end]
    x = Dense(256, 300)(x)
    result = Vector{Any}(undef, 0)
    input = onehot(1, 1:voc_size) #<BOS>
    for i in 1:max_length
        input = emb_layer_nl*input
        output = Chain(GRU(300, 300), Dense(300, voc_size))(input)
        append!(result, output)
        input = onehot(argmax(output), 1:voc_size)
        input.ix == 3 ? break : continue


I suggest that you take a look at https://github.com/FluxML/model-zoo/blob/master/text/phonemes/1-model.jl for an example of how an encoder/decoder model can be implemented in Flux. With the code you’re showing here, it looks like you’re creating a new copy of every single layer every time you call this function (and even throwing away and recreating the decoder layers each trip through the loop), and I’m guessing that this probably is not what you have in mind for your model’s behavior.

Regarding the GPU question, I haven’t actually used this functionality myself, but the documentation for it is here. My understanding is that you’ll need to make sure to call Flux’s gpu function on both your model’s weights and its inputs to make sure that everything has the appropriate GPU types.


I’ve changed my code quite a bit, but now, my model doesn’t return correct translations.

When I overfit my model with just one sentence pair, it gives the correct output, but when I try to overfit it with 2 (or more) sentence pairs, it returns a mix of the two sentences. It looks like the model just picks the most prevalent word from one of the sentences.

This is the link to a nextjournal notebook with my code and data (I can’t get it running there though).

I would highly appreciate if anyone could take a quick look and tell me what I’m doing wrong or link a julia implementation of machine translation. Chances are I’m making an easy mistake since I’m a complete beginner.



I’m not sure there’s a comprehensive Julia implementation of seq2seq for MT anywhere, although I’ve talked about working on one and I may be able to look at your code soon if I get some time.


I finally found the problem, my input didn’t suit the encoder network. I’m going to upload my code as soon as possible.