Loss function for sequence modeling w/ RNN

austinbean · July 2, 2020, 8:55pm

I’d like advice on the best way to model the loss function in Flux for an RNN where I’m modeling a sequence of words in labeled sentences. (This is for a regression task, but a sentiment analysis task would look similar.)

I have sentences of varying lengths, which I split into words encoded as one-hot vectors. Each sentence is associated w/ a single label for the entire sentence.

# Extremely Simplified Example data:
words = [ "This long sentence is about the number six <EOS>",
"This short one is about 2.3 <EOS>"]

actual_labels = [6.0, 2.3]

allwords = unique( reduce(vcat, split.(words)) )
nwords = size(allwords,1)

Because each sequence is associated w/ a single label, I don’t know what the best way to define the loss function is in Flux. I only want to measure the loss for the full RNN representation of the entire sequence. I have so far treated the labels for earlier words as missing and defined the loss over non-missing labels only:

revised_labels = [ 
[missing, missing, missing, missing, missing, missing, missing, missing, 6.0], 
[ missing, missing, missing, missing, missing, missing, 2.3] ]
    # treating all labels except <EOS> as missing 

# An example model, but I need to get the loss function right first
model_m = Chain(RNN(nwords, 16, tanh), Dense(16, 1, identity))

function loss(x,y)
    if !ismissing(y)
        loss = (model_m(x) - y)^2
    end
end 
 
td = vcat([map( v-> Flux.onehot(v, allwords), split(words[1]))], [map( v-> Flux.onehot(v, allwords), split(words[2]))])
train_data = Flux.Data.DataLoader( td, revised_labels);


opt = ADAM(1e-2)
Flux.train!(loss, params(model_m), train_data, opt)

# DimensionMismatch.

For a few different implementations of loss like this one I get a DimensionMismatch error, but I think the real error is that I’m thinking about the loss function incorrectly. I have read the recurrence section of the docs, but so far have not seen a way to the solution for this case. If anyone can point me in the right direction that would be much appreciated.

austinbean · July 10, 2020, 7:42pm

I’m going to answer my own question in case anyone runs into this problem later and finds this question.

Following the example here worked, though for a regression task the model and loss function need to change somewhat:

function build_model(args)
	scanner = Chain(Dense(args.inpt_dim, args.N, σ), LSTM(args.N, args.N))
	encoder = Dense(args.N, 1, identity) # sum outputs and apply identity activation.
	return scanner, encoder 
end 

function model(x, scanner, encoder)
	state = scanner.(x.data)[end]     # the last element, so the last hidden state   
	reset!(scanner)                   
	encoder(state)[1]                 # this returns a vector of a single element, so take the element  
end

The loss function itself is

loss(x, y)=  (model(x, scanner, encoder) - y)^2

Topic		Replies	Views
Porting a RNN model to Flux from PyTorch Machine Learning	5	2166	October 29, 2018
Errors with Flux RNN set Machine Learning question , flux , machine-learning	1	475	April 9, 2022
Specifying loss functions in Flux.jl Machine Learning question , package , flux	8	1966	August 8, 2020
Broadcasting over a Flux.DataLoader Machine Learning	11	1597	November 4, 2021
How to use crossentropy in a multi-class classification Flux.jl neural network? Machine Learning first-steps , flux	3	2762	October 11, 2018

Loss function for sequence modeling w/ RNN

Related topics