Flux seq2seq

I don’t know whether this is the right place to ask but I’m trying to code a seq2seq model in Flux and I’ve got a couple of questions.

The encoder creates the hidden state for the decoder, how can I pass this state to the GRU in the decoder? Is this possible without a custom-built layers?

Secondly, how can I use this model on the gpu? It seems like |> gpu doesn’t work on line 9 because of the indexing?
Also result is of undefined size, so this can’t be ran on my gpu?

Thanks,
Jules

model = function(seq, voc_size, max_length)
    seq = onehotbatch(seq, dictionary_fr[:, 1])
    seq = emb_layer_fr*seq
    
    #split seq in it's columns:
    seq = [seq[:, i] for i in 1:size(seq)[2]]
    
    #encoder
    x = GRU(300, 256).(seq)[end]
    x = Dense(256, 300)(x)
    
    #decoder
    result = Vector{Any}(undef, 0)
    input = onehot(1, 1:voc_size) #<BOS>
    for i in 1:max_length
        input = emb_layer_nl*input
        output = Chain(GRU(300, 300), Dense(300, voc_size))(input)
        append!(result, output)
        input = onehot(argmax(output), 1:voc_size)
        input.ix == 3 ? break : continue
    end
    return(result)
end
2 Likes

I suggest that you take a look at model-zoo/1-model.jl at master · FluxML/model-zoo · GitHub for an example of how an encoder/decoder model can be implemented in Flux. With the code you’re showing here, it looks like you’re creating a new copy of every single layer every time you call this function (and even throwing away and recreating the decoder layers each trip through the loop), and I’m guessing that this probably is not what you have in mind for your model’s behavior.

Regarding the GPU question, I haven’t actually used this functionality myself, but the documentation for it is here. My understanding is that you’ll need to make sure to call Flux’s gpu function on both your model’s weights and its inputs to make sure that everything has the appropriate GPU types.

1 Like

Thanks,
I’ve changed my code quite a bit, but now, my model doesn’t return correct translations.

When I overfit my model with just one sentence pair, it gives the correct output, but when I try to overfit it with 2 (or more) sentence pairs, it returns a mix of the two sentences. It looks like the model just picks the most prevalent word from one of the sentences.

This is the link to a nextjournal notebook with my code and data (I can’t get it running there though).

I would highly appreciate if anyone could take a quick look and tell me what I’m doing wrong or link a julia implementation of machine translation. Chances are I’m making an easy mistake since I’m a complete beginner.

Thanks,
Jules

I’m not sure there’s a comprehensive Julia implementation of seq2seq for MT anywhere, although I’ve talked about working on one and I may be able to look at your code soon if I get some time.

2 Likes

I finally found the problem, my input didn’t suit the encoder network. I’m going to upload my code as soon as possible.

2 Likes

@merckxiaan, I spoke to the nextjournal team to figure out why I couldn’t access your notebook (I am interested in seq2seq in Julia). It turns out that you did not publish the journal so as a result no one else can see it. I’d be greatly obliged if you would!

Hello @Nakul_Tiruviluamala,
After I posted my last message, I abandoned this project since I’m a beginner and I was making mistake after mistake.
Since then however, I’m trying to implement Pytorch’s tutorial on seq2seq machine translation, I’m trying to follow the tutorial as close as possible. Now I believe I’m stuck due to a bug in Flux which prohibits me from concatenating a transposed array (https://github.com/FluxML/Flux.jl/issues/378). I probably also made a lot of mistakes here and there. It would be great if you could have a look at the code and let me know your thougts/questions.
Nextjournal

1 Like

Nextjournal

You need to click publish, to get a shareable link! That’s just your internal edit link :wink:

1 Like

whoops!

2 Likes

Hi @merckxiaan, I’ll definitely be looking at it. I am a beginner as well!

1 Like

Nice @Nakul_Tiruviluamala ,
I’ve just uploaded my code to a github gist because I’m running in an error with Julia. Issue
Even though my code crashes, the loss does decline for a few steps…
Perhaps you could try to reproduce this error?

Jules

Nevermind, I’ve started over once more. Does anyone spot something wrong with my encoder, decoder or attention layer, for some reason, when I train my model the loss get’s stuck and the model predicts some frequent words.

struct Encoder
    embedding
    rnn
end
Encoder(voc_size::Int, h_size::Int) = Encoder(
    param(Flux.glorot_uniform(h_size, voc_size)),
    GRU(h_size, h_size))
function (e::Encoder)(x; dropout=0)
    x = e.embedding*x
    x = Dropout(dropout)(x)
    x = e.rnn(x)
    return(x)
end
Flux.@treelike Encoder

struct Decoder
    embedding
    attention
    rnn
    output
end
Decoder(h_size, voc_size) = Decoder(
    param(Flux.glorot_uniform(h_size, voc_size)),
    Attention(h_size),
    GRU(h_size*2, h_size),
    Dense(h_size, voc_size, relu))
function (d::Decoder)(x, encoder_outputs; dropout=0)
    x = d.embedding * x
    x = Dropout(dropout)(x)
    decoder_state = d.rnn.state
    context = d.attention(encoder_outputs, decoder_state)
    x = d.rnn([x; context])
    x = softmax(d.output(x))
end
Flux.@treelike Decoder

struct Attention
    linear
end
Attention(h_size::Int) = Attention(Dense(2*h_size, 1, tanh))
function (a::Attention)(encoder_outputs, decoder_state)
    weights = []
    results = []
    for word in encoder_outputs
        weight = a.linear([word; decoder_state])
        push!(weights, weight)
    end
    weights = softmax(vcat(weights...))
    return sum([encoder_outputs[i].*weights[i, :]' for i in 1:size(weights, 1)])
end
Flux.@treelike Attention

I’ve made some progress and put all my code, with some explanations, in a notebook. The model does seem to learn something… more often than not, the subject of the sentence is correct, but the remaining words are gibberish.

Also I notice a big difference in performance with different hyperparameters but I’m not sure how I could choose the optimal ones.

I’d really appreciate someone providing me with some feedback.

Thanks,
Jules

1 Like