Specifying loss functions in Flux.jl

I am trying to write loss functions for use in a simple Neural Network training.

As I understand from the documentation loss functions should have the signature loss(ŷ, y). So to my mind something like

loss(ŷ, y) = mean(abs.(ŷ .- y))

ought to work but it does not for me (no error is thrown but the fit never improves). The only one I can get to work is to have the loss function refer to global variables (rather than those passed to it in the train! function). This cannot be how it is meant to work so I was wondering how I can modify my example to get something running properly.

My MWE is:

using Statistics
using Flux

# Making dummy data
obs = 1000
x = rand(Float64, 5, obs)
y = mean(x, dims=1) + sum(x, dims=1)
y[findall(x[4,:] .< 0.3)] .= 17 # Making it slightly harder.

# Making model
m = Chain(
  Dense(5, 5, σ),
  Dense(5, 1))
dataset = zip(x,y)
opt = Descent()

# Attempt 1: Fit does not improve
mae(ŷ, y; agg=mean) = agg(abs.(ŷ .- y)) # Copypasted from here https://github.com/FluxML/Flux.jl/blob/0fa97759367227ced0bde28f39ba5d2abc08e8c7/src/losses/functions.jl#L1-L7
Flux.train!(mae, params(m), dataset, opt)
Flux.train!(mae, params(m), dataset, opt)

# Attempt 2: Fit does not improve
loss2 = Flux.mae
Flux.train!(loss2, params(m), dataset, opt)
Flux.train!(loss2, params(m), dataset, opt)

# Attempt 3: This throws an error.
loss3(A, B) = Flux.mae(m(A),B)
Flux.train!(loss3, params(m), dataset, opt)
Flux.train!(loss3, params(m), dataset, opt)

# Attempt 4: This works but it is terrible (the loss4 function uses global variables rather than anything passed in)
loss4(A, B) = Flux.mae(m(x),y)
Flux.train!(loss4, params(m), dataset, opt)
Flux.train!(loss4, params(m), dataset, opt)

I can see that the Flux.train! is inputting Float64s (ie ŷ and y). So Attempt 1 should have worked.

function loss5(x,y)
    println("x is a ", typeof(x), " and y is a ", typeof(y))
    error("Error To stop training with a pointless loss function")
Flux.train!(loss5, params(m), dataset, opt)
1 Like

This one is what it’s supposed to be. What’s the error?

The error is
ERROR: MethodError: no method matching (::Dense{typeof(σ),Array{Float32,2},Array{Float32,1}})(::Float64)
I figured this was because The “A” that it was feeding into the loss function was a scalar and the model takes a vector as an input.

zip doesn’t work the way you hope:

julia> iterate(dataset)
((0.8517903f0, 4.1195936f0), (2, 2))

It is iterating the flat array, not the rows. Instead, try DataLoader

dataset = Flux.Data.DataLoader((x, y))

Then your Attempt 3 works correctly.


Awesome thanks. I was stuck on this for ages.

So the total solution (for posterity) is:

# Training
dataset = Flux.Data.DataLoader(x, y)
loss3(A, B) = Flux.mae(m(A),B)
Flux.train!(loss3, params(m), dataset, opt)

There are several errors in the code, more conceptual than from the API.

First, check the zip, the dimensions are not right. The third option is the right one, but you get the error.
Second, you try to optimize 1000 input data at the same time, you should use a more reduced batchsize, like 64 or similar.

dataset = Flux.Data.DataLoader((x, y), batchsize=64, shuffle=true)

In that way, you have a dataset more ready to the training.
Then, in one iteration, epoch, it is not guarantee to improve, but it is true that the first should do it.

Anyway, you are testing code without actually know what is loss in the training function: loss receive the input, and the expected output, and it should run the model over ther input and check the results with the output (depending if it is a regression or a classification you should choose one function or another).

Attempt 1 and 2 has not sense because you are trying to compare input against output, you are not running the model at all. Attempt 3 is right, and Attempt 4 is not exactly right, because you are ignoring dataset, using directly x and y (so you are ignoring batchsize or any other preprocessing).
You should check the Documentation to know the basic API before create/copy source code, in that way you can program more quickly and with more robust code.
EDIT: I see @contradict has told you the answer and that you have solved the problem.

1 Like

I think the hint is any time you see flux operating on a scalar (::FLoat64 here), that is probably a bug. It should be operating on Array{FloatXX,N}

1 Like

Thanks, I think I do know deep learning alright (I am not an expert but hopefully not clueless). It was mainly the api I did not know and it was hard to find a complete working MWE example in the documentation.

It is cool to know that DataLoader can take different batchsizes and can shuffle though. I will definitely use that. And this whole solution is much better than my attempt 4 which was definately pretty rough (but did actually work somehow).

Well, the documentation is improving (specially in dev version) but it is true that it supposes many things as known.

Your attempt 4 worked because you were not using dataset but x and y directly, the problem in attempt3 that the zip did not work well with these dimensions, while DataLoader yes. The best solution, as you have see, was attempt3 but avoiding the zip and using DataLoader instead.

1 Like