I’d like advice on the best way to model the loss function in Flux for an RNN where I’m modeling a sequence of words in labeled sentences. (This is for a regression task, but a sentiment analysis task would look similar.)
I have sentences of varying lengths, which I split into words encoded as one-hot vectors. Each sentence is associated w/ a single label for the entire sentence.
# Extremely Simplified Example data:
words = [ "This long sentence is about the number six <EOS>",
"This short one is about 2.3 <EOS>"]
actual_labels = [6.0, 2.3]
allwords = unique( reduce(vcat, split.(words)) )
nwords = size(allwords,1)
Because each sequence is associated w/ a single label, I don’t know what the best way to define the loss function is in Flux. I only want to measure the loss for the full RNN representation of the entire sequence. I have so far treated the labels for earlier words as missing and defined the loss over non-missing labels only:
revised_labels = [
[missing, missing, missing, missing, missing, missing, missing, missing, 6.0],
[ missing, missing, missing, missing, missing, missing, 2.3] ]
# treating all labels except <EOS> as missing
# An example model, but I need to get the loss function right first
model_m = Chain(RNN(nwords, 16, tanh), Dense(16, 1, identity))
function loss(x,y)
if !ismissing(y)
loss = (model_m(x) - y)^2
end
end
td = vcat([map( v-> Flux.onehot(v, allwords), split(words[1]))], [map( v-> Flux.onehot(v, allwords), split(words[2]))])
train_data = Flux.Data.DataLoader( td, revised_labels);
opt = ADAM(1e-2)
Flux.train!(loss, params(model_m), train_data, opt)
# DimensionMismatch.
For a few different implementations of loss
like this one I get a DimensionMismatch
error, but I think the real error is that I’m thinking about the loss function incorrectly. I have read the recurrence section of the docs, but so far have not seen a way to the solution for this case. If anyone can point me in the right direction that would be much appreciated.