Implement berHu loss in Flux

Hi, I was trying to build this loss function from this paper:


I wanted to see if it outperforms squared L2 norm in another regression problem I’m working on. Here is my attempt:

function berhu(x, y)
        x = model(x)
        loss = Tracker.collect(zeros(Float32, size(x)))
        bound = 0.2*maximum(abs.(x-y))
        inbound = abs.(x-y) .<= bound
        loss[inbound] .= norm.((x-y)[inbound], 1)
        loss[.!inbound] .= (((x-y)[.!inbound]).^2 .+ bound^2)./(2*bound)
        return loss

It works as intended when I comment out the x = model(x) line, with dummy variables. The problem when I run this is with loss[inbound], which was brought up in issue #93 on Flux repo.

Stack trace of error:

ERROR: Can't differentiate `setindex!`
 [1] #setindex!#369(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::TrackedArray{…,SubArray{Float32,1,Array{Float32,1},Tuple{Array{Int64,1}},false}}, ::Float64, ::Int64) at C:\Users\me\.julia\packages\Tracker\JhqMQ\src\lib\array.jl:65
 [2] setindex!(::TrackedArray{…,SubArray{Float32,1,Array{Float32,1},Tuple{Array{Int64,1}},false}}, ::Float64, ::Int64) at C:\Users\me\.julia\packages\Tracker\JhqMQ\src\lib\array.jl:65
 [3] macro expansion at .\broadcast.jl:843 [inlined]
 [4] macro expansion at .\simdloop.jl:73 [inlined]
 [5] copyto! at .\broadcast.jl:842 [inlined]
 [6] copyto! at .\broadcast.jl:797 [inlined]
 [7] materialize!(::TrackedArray{…,SubArray{Float32,1,Array{Float32,1},Tuple{Array{Int64,1}},false}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(norm),Tuple{Array{Float64,1},Int64}}) at .\broadcast.jl:756
 [8] top-level scope at none:0

How can I implement this loss function correctly?

Can you try on version 0.10 of flux?

This is not going to work for any version of Flux. You have to rewrite it so that you do not mutate any vector

Makes sense, but how do I implement it without mutating any vector?

1 Like

You can use boolean vectors that act like masks to blend the two together. Not sure if this is the best strategy though. Another alternative is to use mutation and define the gradient manually.

Thanks, but I thought the inbound = abs.(x-y) .<= bound line is a boolean mask?

Also I just realized LossFunctions.jl is a thing, so I’ll take a look at its implementation for huber loss tomorrow.

It is, but you use it for indexing rather than multiplication

1 Like

It’s been a while, but any more suggestions? Thanks!

It seems like le_float (i.e <= for floats) is not differentiable. Perhaps you can use strictly less than (ie. <) without any noticeable loss in performance or just reverse the in_bounds to out_bounds and use in_bounds = .!out_bounds.

Can’t believe I didn’t think of just using .< insteda of .<= haha, but it appears to be working now, thanks!