Flux seq2seq

merckxiaan · August 20, 2018, 7:36pm

I don’t know whether this is the right place to ask but I’m trying to code a seq2seq model in Flux and I’ve got a couple of questions.

The encoder creates the hidden state for the decoder, how can I pass this state to the GRU in the decoder? Is this possible without a custom-built layers?

Secondly, how can I use this model on the gpu? It seems like |> gpu doesn’t work on line 9 because of the indexing?
Also result is of undefined size, so this can’t be ran on my gpu?

Thanks,
Jules

model = function(seq, voc_size, max_length)
    seq = onehotbatch(seq, dictionary_fr[:, 1])
    seq = emb_layer_fr*seq
    
    #split seq in it's columns:
    seq = [seq[:, i] for i in 1:size(seq)[2]]
    
    #encoder
    x = GRU(300, 256).(seq)[end]
    x = Dense(256, 300)(x)
    
    #decoder
    result = Vector{Any}(undef, 0)
    input = onehot(1, 1:voc_size) #<BOS>
    for i in 1:max_length
        input = emb_layer_nl*input
        output = Chain(GRU(300, 300), Dense(300, voc_size))(input)
        append!(result, output)
        input = onehot(argmax(output), 1:voc_size)
        input.ix == 3 ? break : continue
    end
    return(result)
end

dellison · August 22, 2018, 10:37pm

I suggest that you take a look at model-zoo/1-model.jl at master · FluxML/model-zoo · GitHub for an example of how an encoder/decoder model can be implemented in Flux. With the code you’re showing here, it looks like you’re creating a new copy of every single layer every time you call this function (and even throwing away and recreating the decoder layers each trip through the loop), and I’m guessing that this probably is not what you have in mind for your model’s behavior.

Regarding the GPU question, I haven’t actually used this functionality myself, but the documentation for it is here. My understanding is that you’ll need to make sure to call Flux’s gpu function on both your model’s weights and its inputs to make sure that everything has the appropriate GPU types.

merckxiaan · August 28, 2018, 3:43pm

Thanks,
I’ve changed my code quite a bit, but now, my model doesn’t return correct translations.

When I overfit my model with just one sentence pair, it gives the correct output, but when I try to overfit it with 2 (or more) sentence pairs, it returns a mix of the two sentences. It looks like the model just picks the most prevalent word from one of the sentences.

This is the link to a nextjournal notebook with my code and data (I can’t get it running there though).

I would highly appreciate if anyone could take a quick look and tell me what I’m doing wrong or link a julia implementation of machine translation. Chances are I’m making an easy mistake since I’m a complete beginner.

Thanks,
Jules

jekbradbury · August 28, 2018, 10:04pm

I’m not sure there’s a comprehensive Julia implementation of seq2seq for MT anywhere, although I’ve talked about working on one and I may be able to look at your code soon if I get some time.

merckxiaan · September 1, 2018, 2:09pm

I finally found the problem, my input didn’t suit the encoder network. I’m going to upload my code as soon as possible.

Nakul_Tiruviluamala · October 22, 2018, 2:47pm

@merckxiaan, I spoke to the nextjournal team to figure out why I couldn’t access your notebook (I am interested in seq2seq in Julia). It turns out that you did not publish the journal so as a result no one else can see it. I’d be greatly obliged if you would!

merckxiaan · October 22, 2018, 3:14pm

Hello @Nakul_Tiruviluamala,
After I posted my last message, I abandoned this project since I’m a beginner and I was making mistake after mistake.
Since then however, I’m trying to implement Pytorch’s tutorial on seq2seq machine translation, I’m trying to follow the tutorial as close as possible. Now I believe I’m stuck due to a bug in Flux which prohibits me from concatenating a transposed array (https://github.com/FluxML/Flux.jl/issues/378). I probably also made a lot of mistakes here and there. It would be great if you could have a look at the code and let me know your thougts/questions.
Nextjournal

sdanisch · October 22, 2018, 3:18pm

Nextjournal

You need to click publish, to get a shareable link! That’s just your internal edit link

merckxiaan · October 22, 2018, 3:19pm

whoops!

Nakul_Tiruviluamala · October 23, 2018, 4:00pm

Hi @merckxiaan, I’ll definitely be looking at it. I am a beginner as well!

merckxiaan · October 26, 2018, 8:34pm

Nice @Nakul_Tiruviluamala ,
I’ve just uploaded my code to a github gist because I’m running in an error with Julia. Issue
Even though my code crashes, the loss does decline for a few steps…
Perhaps you could try to reproduce this error?

Jules

merckxiaan · October 31, 2018, 8:21pm

Nevermind, I’ve started over once more. Does anyone spot something wrong with my encoder, decoder or attention layer, for some reason, when I train my model the loss get’s stuck and the model predicts some frequent words.

struct Encoder
    embedding
    rnn
end
Encoder(voc_size::Int, h_size::Int) = Encoder(
    param(Flux.glorot_uniform(h_size, voc_size)),
    GRU(h_size, h_size))
function (e::Encoder)(x; dropout=0)
    x = e.embedding*x
    x = Dropout(dropout)(x)
    x = e.rnn(x)
    return(x)
end
Flux.@treelike Encoder

struct Decoder
    embedding
    attention
    rnn
    output
end
Decoder(h_size, voc_size) = Decoder(
    param(Flux.glorot_uniform(h_size, voc_size)),
    Attention(h_size),
    GRU(h_size*2, h_size),
    Dense(h_size, voc_size, relu))
function (d::Decoder)(x, encoder_outputs; dropout=0)
    x = d.embedding * x
    x = Dropout(dropout)(x)
    decoder_state = d.rnn.state
    context = d.attention(encoder_outputs, decoder_state)
    x = d.rnn([x; context])
    x = softmax(d.output(x))
end
Flux.@treelike Decoder

struct Attention
    linear
end
Attention(h_size::Int) = Attention(Dense(2*h_size, 1, tanh))
function (a::Attention)(encoder_outputs, decoder_state)
    weights = []
    results = []
    for word in encoder_outputs
        weight = a.linear([word; decoder_state])
        push!(weights, weight)
    end
    weights = softmax(vcat(weights...))
    return sum([encoder_outputs[i].*weights[i, :]' for i in 1:size(weights, 1)])
end
Flux.@treelike Attention

merckxiaan · January 5, 2019, 12:52pm

I’ve made some progress and put all my code, with some explanations, in a notebook. The model does seem to learn something… more often than not, the subject of the sentence is correct, but the remaining words are gibberish.

Also I notice a big difference in performance with different hyperparameters but I’m not sure how I could choose the optimal ones.

I’d really appreciate someone providing me with some feedback.

Thanks,
Jules

Topic		Replies	Views
Is there an implementation of the attention mechanism in Flux.jl? Machine Learning flux	5	2841	September 23, 2020
How to do batching in Flux's recurrent sequence model to take advantage of GPU during training? Machine Learning flux	1	819	September 12, 2019
Porting a RNN model to Flux from PyTorch Machine Learning	5	2140	October 29, 2018
Flux: 1D convolutions (on genomic data) Machine Learning flux	20	2566	May 28, 2023
Building simple sequence-to-one RNN with Flux New to Julia flux	8	2080	March 4, 2021

Flux seq2seq

Related topics