Why does binarycrossentropy needs an index in a denoising autoencoder?

swuyts · June 8, 2020, 7:20pm

Hi there,

I am still somewhat a rookie in Julia and Flux and I have problem understanding what is going on when I switch between crossentropy and binarycrossentropy loss functions.

I coded the following simple denoising autoencoder:

using Flux, Random

data = rand(2000,100)
data_corrupted = copy(data)

# Corrupt data
for sample_index in 1:size(data)[2]
        # Create random indices
        rng = MersenneTwister(1234)
        indices = findall(bitrand(rng, 2000) .> 0)

                # Change values at indices to 0
                for i in 1:size(indices)[1]
                        data_corrupted[indices[i], sample_index] = 0
                end
end

# Partition into batches of 10
data = [data[:, i:min(i+10-1,size(data, 2))] for i in 1:10:size(data, 2)]
data_corrupted = [data_corrupted[:, i:min(i+10-1,size(data_corrupted, 2))] for i in 1:10:size(data_corrupted, 2)]

# Define model
encoder = Dense(2000, 50, σ)
decoder = Dense(50, 2000, σ)
m = Chain(encoder, decoder)

# Defining the loss function
loss(x, y) = Flux.crossentropy(m(x), y)

# Defining the optimiser
opt = ADAM()

# Train
Flux.train!(loss, params(m), zip(data_corrupted, data), opt)

This runs fine.

But if I then change the loss function to:

loss(x, y) = Flux.binarycrossentropy(m(x), y)

I get the following error:

ERROR: LoadError: MethodError: no method matching eps(::Array{Float32,2})
Closest candidates are:
  eps(!Matched::Dates.Time) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:387
  eps(!Matched::Dates.Date) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:386
  eps(!Matched::Dates.DateTime) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:385
 ...

However if I change the loss to what has been suggested here:

loss(x, y) = Flux.binarycrossentropy(m(x)[1], y[1])

The model trains without any problem.

I have a hard time understanding why I need this indexing for the binarycrossentropy, while I do not need it for the crossentropy. I understand that the eps function requires an array of dim 1, but I am confused as whether it will now calculate the loss only on the first batch instead of on all data.

Any insights are very welcome!

Many thanks,
Sander

contradict · June 8, 2020, 9:48pm

The loss function needs to return a scalar.

From the definitions of crossentropy and binarycrossentropy, crossentropy includes a sum over the model output, while binarycrossentropy expects a scalar model output.

Try broadcasting binarycrossentropy over the output and summing like this:

loss(x, y) = sum(Flux.binarycrossentropy.(m(x), y))

swuyts · June 9, 2020, 7:12am

That makes a whole lot of sense and works like a charm. Thank you for pointing this out to me!

Topic		Replies	Views
Logistic regression in flux New to Julia flux	1	2082	April 8, 2020
How to use crossentropy in a multi-class classification Flux.jl neural network? Machine Learning first-steps , flux	3	2762	October 11, 2018
Problem of binarycrossentropy with GPU? Machine Learning cuda , flux	2	866	November 15, 2019
Flux: Scalar getindex error Machine Learning	13	2126	May 15, 2020
Binary classification with Flux Machine Learning question , flux	8	2080	April 9, 2021

Why does binarycrossentropy needs an index in a denoising autoencoder?

Related topics