I’m trying to learn the use of embedding layers by the way of 10 string documents labelled as positive or negative.
StringDocs = ["well done", "good work", "great effort", "nice work", "excellent", "weak", "poor effort", "not good", "poor work", "could have done better"]
y = [1,1,1,1,1,0,0,0,0,0]
pad_size=4
N = 10
Each word in the document (vocab) is represented by an integer and after some code to prepare the training data x becomes.
(Note it is transposed as an ‘input’ x N matrix for Flux).
x = [ 0 0 0 0 0 0 0 0 0 2 ;
0 0 0 0 0 0 0 0 0 8 ;
13 6 7 9 0 0 11 10 11 3 ;
3 14 4 14 5 12 4 6 14 1 ]
The rest of the code is below. I was hoping the embedding layer (W) would change and learn after every epoch but it’s not changing? I’ve been stuck on this for a while and need a few pointers. Would be grateful for any pointers or examples that may help.
data = [(x, y)]
W = param(Flux.glorot_normal(8, 51))
max_features, vocab_size = size(W)
one_hot_matrix=Flux.onehotbatch(reshape(x, pad_size*N), 0:vocab_size-1)
m = Chain(x -> W * one_hot_matrix,
x -> reshape(x, max_features, pad_size, N),
x -> mean(x, dims=2),
x -> reshape(x, 8, 10),
Dense(8, 1),
)
# if I add softmax above the loss doesn't change
loss(x, y) = Flux.mse(m(x), y)
optimizer = Flux.Descent(0.001)
for epoch in 1:10
Flux.train!(loss, Flux.params(m), data, optimizer)
println("loss=",loss(x, y).data)
end
show(Flux.params(m))
The output is : -
loss=3.7767549
loss=3.7263222
loss=3.6780336
loss=3.6317985
loss=3.5875306
loss=3.5451438
loss=3.5045598
loss=3.4657013
loss=3.4284942
loss=3.39287
Params([Float32[0.419828 -0.139223 -0.225595 0.0708142 0.232704 0.0907047 0.707192 -0.167613] (tracked), Float32[0.144213] (tracked)])
I was expecting to also see 8x51 params for the embedding layer.
I could be close or I could be way off but I can’t find examples close enough to help me progress.