Binary classification with Flux

Tord · February 4, 2021, 5:53pm

I originally asked this question on the Julia slack, but it occurred to me that this could be a better place to ask:

I’ve used Flux successfully for regression and multi-label classification in the past, but today I got stuck on a very simple binary classification problem. I naively assumed that this would be a simple matter of starting with a multi-label classification (using https://github.com/FluxML/model-zoo/blob/master/vision/mnist/mlp.jl as a starting point), removing one-hot encoding of labels, reducing the number of outputs from the final layer to 1, and replacing logitcrossentropy with Flux.Losses.logitbinarycrossentropy as my loss function. This turned out not to work. My model refused to learn at all.

I made this gist where I have tried to modify the above example from the Flux model zoo to do a binary classification (guessing whether a MNIST image is a 0 or not):

It is still not working. What am I doing wrong here?

anon74562486 · February 4, 2021, 8:22pm

I’m not a Deep Learning expert.
I think the problem is vanishing gradient (I have tried with a learning rate of 10^7 and the model works).
Moreover I think that data are not balanced, maybe you could try dividing the numbers into even and odd.
Sorry for my english.

Tord · February 5, 2021, 12:30pm

Thank you for your reply! Your English is at least as good as mine (also not a native speaker).

Unbalanced data is not the problem. When i one-hot-encode the two labels (zero and non-zero) and revert to using two outputs from the final layer and Flux.losses.logitcrossentropy, everything works well.

Christian · April 8, 2021, 9:13pm

Were you ever able to find an answer to this problem? I’m having the same.

ToucheSir · April 8, 2021, 10:01pm

Can you post a MWE? The OP didn’t test different learning rates and many other configurations. For example, using one output to accommodate logitbinarycrossentropy halves the number of parameters in the second layer and may prevent the model from learning effectively.

Christian · April 8, 2021, 10:18pm

At epoch 0 the code below outputs:
loss,accuracy = (0.04393502f0, 0.498842289209262)
At epoch 1:
loss,accuracy = (0.03497529f0, 0.7889908256880734)
At epoch N:
loss,accuracy = (0.03497529f0, 0.7889908256880734)

The loss never updates after the first.
I realized that the model is outputting 0 every time, and that 0.788… corresponds to the percentage of non-matches in the training set. Could the problem be that the classes are too imbalanced?

function loss_and_accuracy(data_loader, model, device)
    acc = 0
    ls = 0.0f0
    num = 0
    for (x, y) in data_loader # calculate for each batch
        x, y = device(x), device(y)
        ŷ = model(x)
        ls += Flux.Losses.logitbinarycrossentropy(ŷ, y)
        guesses = [v[1] > 0.5 ? 1 : 0 for v ∈ ŷ]
        acc += sum(guesses .== y) * 1 / size(x, 2)
        num +=  size(x, 2)
    end
    return ls / num, acc / num
end

numfeatures = size(train_data.data[1],1)
model  = Chain(
        Dense(numfeatures, numfeatures,relu),
        Dense(numfeatures, 1,σ),
        )

ps = Flux.params(model)
opt = ADAM(3e-4)
loss(ŷ, y) = Flux.Losses.logitbinarycrossentropy(ŷ, y)

for epoch in 1:5
    for (x, y) in train_data
        gs = Flux.gradient(ps) do
            loss(model(x), y)
        end
        Flux.Optimise.update!(opt, ps, gs)

        loss_and_accuracy(train_loader, model, device)  
    end
end

ToucheSir · April 9, 2021, 3:12am

logitbinarycrossentropy fuses the sigmoid and loss, so the second layer should not have a sigmoid activation.

It’s difficult to tell what other issues could be present or run this locally without train_data itself, can you provide that as well (i.e. a full executable example)?

Christian · April 9, 2021, 8:11am

Here’s a sample of the train_data. All values fall between 1.0f-12 and 1.0

julia> train_data.data[1]
22815×1526 Matrix{Float32}

julia> train_data.data[1][:,1]
22815-element Vector{Float32}:
 1.0f-12
 0.0017195814
 0.0017195814
 0.023636289
 0.028990168
 0.044405
 0.043595545
 0.06990052
 0.055057727
 0.05537212
 0.09786336
 0.10126311
 0.10529525
 0.09850332
 0.120765366
 0.23463665
 0.19305018
 0.19305018
 ⋮
 0.9924742
 0.9953276
 1.0
 0.87539566
 0.8523039
 0.8453158
 0.83677983
 0.82271576
 0.84766895
 0.8416806
 0.83194155
 0.81610835
 0.7944943
 0.84520817
 0.812333
 0.79442024

ToucheSir · April 9, 2021, 4:15pm

Please have a read through Please read: make it easier to help you. Having a single sample of the data doesn’t help much with having a MWE that others can run locally and debug. Also, have you tried any of the suggestions mentioned upthread?

Topic		Replies	Views
Logistic regression in flux New to Julia flux	1	2072	April 8, 2020
How to make a basic classification in flux? Machine Learning question , flux , classification	2	1154	April 2, 2021
Problems with Flux Machine Learning	2	1596	March 14, 2018
Introductory ML model in Flux General Usage	2	369	May 10, 2021
Error while trying to replicate a logistic regression problem using FluxML Machine Learning	5	556	August 23, 2020

Binary classification with Flux

Related topics