I’d like advice on the best way to model the loss function in Flux for an RNN where I’m modeling a sequence of words in labeled sentences. (This is for a regression task, but a sentiment analysis task would look similar.)
I have sentences of varying lengths, which I split into words encoded as one-hot vectors. Each sentence is associated w/ a single label for the entire sentence.
# Extremely Simplified Example data: words = [ "This long sentence is about the number six <EOS>", "This short one is about 2.3 <EOS>"] actual_labels = [6.0, 2.3] allwords = unique( reduce(vcat, split.(words)) ) nwords = size(allwords,1)
Because each sequence is associated w/ a single label, I don’t know what the best way to define the loss function is in Flux. I only want to measure the loss for the full RNN representation of the entire sequence. I have so far treated the labels for earlier words as missing and defined the loss over non-missing labels only:
revised_labels = [ [missing, missing, missing, missing, missing, missing, missing, missing, 6.0], [ missing, missing, missing, missing, missing, missing, 2.3] ] # treating all labels except <EOS> as missing # An example model, but I need to get the loss function right first model_m = Chain(RNN(nwords, 16, tanh), Dense(16, 1, identity)) function loss(x,y) if !ismissing(y) loss = (model_m(x) - y)^2 end end td = vcat([map( v-> Flux.onehot(v, allwords), split(words))], [map( v-> Flux.onehot(v, allwords), split(words))]) train_data = Flux.Data.DataLoader( td, revised_labels); opt = ADAM(1e-2) Flux.train!(loss, params(model_m), train_data, opt) # DimensionMismatch.
For a few different implementations of
loss like this one I get a
DimensionMismatch error, but I think the real error is that I’m thinking about the loss function incorrectly. I have read the recurrence section of the docs, but so far have not seen a way to the solution for this case. If anyone can point me in the right direction that would be much appreciated.