Flux custom loss function

jstrube · February 23, 2019, 11:18pm

I’m trying to run a straightforward regression problem through Flux. I’ve modified the loss function from the example, from a simple mse to something like a chi-square. The problem has to do with the calibration of a calorimeter, and the error term of the chi-square is proportional to the sqrt of the energy.
The following code exemplifies the problem:

using Flux
model = Chain(
    Dense(250, 64),
    Dense(64, 1)
)
x = rand(250, 1000)
y = rand(1000)
function loss(x, y)
    error = sqrt.(y)
    sqrt(sum(((model(x) .- y).^2)./error))
end
@time Flux.Tracker.gradient(()->loss(x, y), params(model))

This code results in the following output on my machine.

 15.787468 seconds (211.02 M allocations: 5.623 GiB, 4.70% gc time)

Removing the error term from the loss function gives me:

  0.117279 seconds (15.27 k allocations: 59.252 MiB, 14.24% gc time)

In my actual example (where my input vector is of size 200k), I just can’t train with the loss function with the error term at all, but without it, my training is suboptimal, and I think I need this term.
Does anybody have a suggestion for how to get this loss function to work?

kevin.squire · February 24, 2019, 7:36am

Not a complete answer, but note that in your current loss function,

error is allocated as a temporary variable.

If instead, the sqrt.(y) is simply included on the same line:

function loss(x, y)
    sqrt(sum(((model(x) .- y).^2)./sqrt.(y)))
end

the allocations are reduced, and for me computation time is cut by more than half:

Before:

 16.608458 seconds (211.00 M allocations: 5.623 GiB, 3.29% gc time)

After:

  6.484304 seconds (135.00 M allocations: 3.269 GiB, 6.02% gc time)

My understanding is that when the broadcast dots (.) are all part of the same evaluation, the function calculations are fused and avoid temporaries (although it would be nice if the same could be recognized in your original function).

This obviously still isn’t as fast as if the error term were not there, and there are still quite a few allocations. Part of the slowdown might simply be that division is a slow operation, but there may be further changes that would help.

Cheers,
Kevin

kevin.squire · February 24, 2019, 8:43am

Actually, I think you’re missing a transpose after model(x) in your loss function:

julia> size(model(x))
(1, 1000)

julia> size(y)
(1000,)

julia> size(model(x) .- y)
(1000, 1000)

julia> size(model(x)' .- y)
(1000, 1)

~~Just changing model(x) to model(x)', I get~~

~~julia> @time Flux.Tracker.gradient(()->loss(x, y), params(model)) 4.995629 seconds (17.29 M allocations: 896.440 MiB, 4.87% gc time) Grads(...)~~

Moving sqrt.(y) to the second line gives:

julia> @time Flux.Tracker.gradient(()->loss(x, y), params(model))
  1.008449 seconds (3.64 M allocations: 194.016 MiB, 5.21% gc time)
Grads(...)

~~Still not 0.1 seconds, but maybe more acceptable than 16 seconds.~~

Edit: My timings are all wrong above–I was working with a dirty session. After restarting Julia, and using this loss function:

function loss(x, y)
    error = sqrt.(y)
    sqrt(sum(((model(x)' .- y).^2)./error))
end

I’m getting

julia> @time Flux.Tracker.gradient(()->loss(x, y), params(model))
  0.041938 seconds (2.55 k allocations: 5.336 MiB)
Grads(...)

Cheers,
Kevin

jstrube · February 24, 2019, 9:45pm

Now I feel stupid.
This fixes both the computational performance as well as the problem with my training not converging.
(not surprisingly, if I used the wrong loss function).
I always spend inane amounts of time fixing dimension problems like this in numpy, but I had thought my extension of the linear regression example from the model zoo was straightforward enough…
Many thanks!

Topic		Replies	Views
M. learning with regularization using Flux is too slow? Performance question , flux	5	408	February 9, 2024
FluxML Basic Custom Layer with Custom Loss Function Machine Learning flux	3	1816	March 19, 2019
Flux Custom Loss Function Not Working Properly Machine Learning flux , zygote	20	2204	April 2, 2021
Why the Loss function does not decrease significantly in Flux.jl Machine Learning	2	316	February 2, 2023
Issue with custom Flux loss function General Usage flux	0	226	December 7, 2022

Flux custom loss function

Related topics