Why does binarycrossentropy needs an index in a denoising autoencoder?

Hi there,

I am still somewhat a rookie in Julia and Flux and I have problem understanding what is going on when I switch between crossentropy and binarycrossentropy loss functions.

I coded the following simple denoising autoencoder:

using Flux, Random

data = rand(2000,100)
data_corrupted = copy(data)

# Corrupt data
for sample_index in 1:size(data)[2]
        # Create random indices
        rng = MersenneTwister(1234)
        indices = findall(bitrand(rng, 2000) .> 0)

                # Change values at indices to 0
                for i in 1:size(indices)[1]
                        data_corrupted[indices[i], sample_index] = 0
                end
end

# Partition into batches of 10
data = [data[:, i:min(i+10-1,size(data, 2))] for i in 1:10:size(data, 2)]
data_corrupted = [data_corrupted[:, i:min(i+10-1,size(data_corrupted, 2))] for i in 1:10:size(data_corrupted, 2)]

# Define model
encoder = Dense(2000, 50, σ)
decoder = Dense(50, 2000, σ)
m = Chain(encoder, decoder)

# Defining the loss function
loss(x, y) = Flux.crossentropy(m(x), y)

# Defining the optimiser
opt = ADAM()

# Train
Flux.train!(loss, params(m), zip(data_corrupted, data), opt)

This runs fine.

But if I then change the loss function to:

loss(x, y) = Flux.binarycrossentropy(m(x), y)

I get the following error:

ERROR: LoadError: MethodError: no method matching eps(::Array{Float32,2})
Closest candidates are:
  eps(!Matched::Dates.Time) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:387
  eps(!Matched::Dates.Date) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:386
  eps(!Matched::Dates.DateTime) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:385
 ...

However if I change the loss to what has been suggested here:

loss(x, y) = Flux.binarycrossentropy(m(x)[1], y[1])

The model trains without any problem.

I have a hard time understanding why I need this indexing for the binarycrossentropy, while I do not need it for the crossentropy. I understand that the eps function requires an array of dim 1, but I am confused as whether it will now calculate the loss only on the first batch instead of on all data.

Any insights are very welcome!

Many thanks,
Sander

The loss function needs to return a scalar.

From the definitions of crossentropy and binarycrossentropy, crossentropy includes a sum over the model output, while binarycrossentropy expects a scalar model output.

Try broadcasting binarycrossentropy over the output and summing like this:

loss(x, y) = sum(Flux.binarycrossentropy.(m(x), y)) 
2 Likes

That makes a whole lot of sense and works like a charm. Thank you for pointing this out to me!