I originally asked this question on the Julia slack, but it occurred to me that this could be a better place to ask:
I’ve used Flux successfully for regression and multi-label classification in the past, but today I got stuck on a very simple binary classification problem. I naively assumed that this would be a simple matter of starting with a multi-label classification (using https://github.com/FluxML/model-zoo/blob/master/vision/mnist/mlp.jl as a starting point), removing one-hot encoding of labels, reducing the number of outputs from the final layer to 1, and replacing logitcrossentropy with Flux.Losses.logitbinarycrossentropy as my loss function. This turned out not to work. My model refused to learn at all.
I made this gist where I have tried to modify the above example from the Flux model zoo to do a binary classification (guessing whether a MNIST image is a 0 or not):
It is still not working. What am I doing wrong here?
I’m not a Deep Learning expert.
I think the problem is vanishing gradient (I have tried with a learning rate of 10^7 and the model works).
Moreover I think that data are not balanced, maybe you could try dividing the numbers into even and odd.
Sorry for my english.
Thank you for your reply! Your English is at least as good as mine (also not a native speaker).
Unbalanced data is not the problem. When i one-hot-encode the two labels (zero and non-zero) and revert to using two outputs from the final layer and Flux.losses.logitcrossentropy, everything works well.
Can you post a MWE? The OP didn’t test different learning rates and many other configurations. For example, using one output to accommodate logitbinarycrossentropy halves the number of parameters in the second layer and may prevent the model from learning effectively.
At epoch 0 the code below outputs:
loss,accuracy = (0.04393502f0, 0.498842289209262)
At epoch 1:
loss,accuracy = (0.03497529f0, 0.7889908256880734)
At epoch N:
loss,accuracy = (0.03497529f0, 0.7889908256880734)
The loss never updates after the first.
I realized that the model is outputting 0 every time, and that 0.788… corresponds to the percentage of non-matches in the training set. Could the problem be that the classes are too imbalanced?
function loss_and_accuracy(data_loader, model, device)
acc = 0
ls = 0.0f0
num = 0
for (x, y) in data_loader # calculate for each batch
x, y = device(x), device(y)
ŷ = model(x)
ls += Flux.Losses.logitbinarycrossentropy(ŷ, y)
guesses = [v[1] > 0.5 ? 1 : 0 for v ∈ ŷ]
acc += sum(guesses .== y) * 1 / size(x, 2)
num += size(x, 2)
end
return ls / num, acc / num
end
numfeatures = size(train_data.data[1],1)
model = Chain(
Dense(numfeatures, numfeatures,relu),
Dense(numfeatures, 1,σ),
)
ps = Flux.params(model)
opt = ADAM(3e-4)
loss(ŷ, y) = Flux.Losses.logitbinarycrossentropy(ŷ, y)
for epoch in 1:5
for (x, y) in train_data
gs = Flux.gradient(ps) do
loss(model(x), y)
end
Flux.Optimise.update!(opt, ps, gs)
loss_and_accuracy(train_loader, model, device)
end
end
logitbinarycrossentropy fuses the sigmoid and loss, so the second layer should not have a sigmoid activation.
It’s difficult to tell what other issues could be present or run this locally without train_data itself, can you provide that as well (i.e. a full executable example)?
Please have a read through Please read: make it easier to help you. Having a single sample of the data doesn’t help much with having a MWE that others can run locally and debug. Also, have you tried any of the suggestions mentioned upthread?