Should ForwardDiff.jl issue warnings at non-differentiable points?

Dan · June 12, 2017, 3:33pm

The derivative of abs is discontinuous at zero. Currently ForwardDiff chooses the +1.0 derivative. Should this choice be done without a warning?

In formal math, a new concept called the subgradient is defined to include both +1.0 and -1.0 and the whole [-1.0,1.0] interval. There is no inherent justification of choosing just the +1.0 point from the interval.
Significant math follows from knowing and considering other values as representatives of the subgradient interval.

A possible set of warnings could be implemented with:

julia> using ForwardDiff

julia> @inline Base.:<(d::ForwardDiff.Dual{T,V,N} where T where V<:Real where N,x::AbstractFloat) = ( ForwardDiff.value(d)==x && warn("Subgradient is not a singleton. Forced to pick single value") ; ForwardDiff.value(d)<x )

julia> ForwardDiff.Dual(10.0,-1.0) < 10.0     # This can be both true and false
WARNING: Subgradient is not a singleton. Forced to pick single value
false

julia> @inline Base.abs(d::ForwardDiff.Dual) = ( ForwardDiff.value(d)==zero(typeof(d)) && warn("Subgradient is not a singleton. Forced to pick single value") ; signbit(ForwardDiff.value(d)) ? -d : d )

julia> abs(ForwardDiff.Dual(0.0,1.0))
WARNING: Subgradient is not a singleton. Forced to pick single value
Dual{Void}(0.0,1.0)

julia> @inline Base.:<(d::ForwardDiff.Dual{T,V,N} where T where V<:Real where N,x::W where W<:Integer) = ( ForwardDiff.value(d)==x && warn("Subgradient is not a singleton. Forced to pick single value") ; ForwardDiff.value(d)<x )

julia> ForwardDiff.Dual(0.0,-1.0)<0
WARNING: Subgradient is not a singleton. Forced to pick single value
false

julia> signbit(ForwardDiff.Dual(0.0,-1.0))
WARNING: Subgradient is not a singleton. Forced to pick single value
false

What are the forum’s views on this?

JaredCrean2 · June 12, 2017, 6:26pm

I suspect issuing warnings like this would kill the performance of the code. If benchmarking bears this out (be sure to check a case where the differentiated code vectorizes), I would be opposed to make this change.

Tamas_Papp · June 12, 2017, 7:00pm

AFAIK this (= if you are trying to differentiate non-differentiable functions, be prepared to face the consequences) is well understood, but hard to deal with.

Ideally, one would not AD non-differentiable functions.
But sometimes this happens, and then one would hope for an almost zero-measure set of nondifferentiable points, so we can quietly ignore this (eg with MCMC).
Then the worst-case is some iterative procedure where your algorithm just loves hanging out at the nondifferentiable set.

Whether you want warnings depends on the use case. For 2, you would get the occasional one, for 3, your screen would be flooded with them, and indicate that you are using the wrong algorithm.

Dan · June 12, 2017, 7:29pm

@Tamas_Papp, this is a good summary. But there are two points which made me put this out:

The deterministic symmetry breaking to one arbitrary and extreme derivative is bothersome - feels like missing out on the dimension of the possible results.
Julia is a good language to extend this AD facility to generate subgradients and calculate with them (like interval arithmetic). This route could potentially lead to multivariate subgradients i.e. cones and operations on them, which could make expressing optimization problems on polytopes nicely in native Julia (and have them run decently).

ExpandingMan · June 12, 2017, 7:39pm

Ideally the user would be able to specify which direction to take the limit from or whether to have a warning. (Whether that’s practical here, I have no idea.)

Leaving this warning on by default seems like a recipe for a lot of pain and frustration for someone.

Tamas_Papp · June 12, 2017, 7:49pm

Possibly, but the solutions I can imagine would be very heavyweight and/or sacrifice type stability. If you have a good solution, make a PR or a proof of concept.

cortner · June 12, 2017, 8:27pm

If you have floating point accuracy, how do you know that you are at 0.0 or, say, 1e-32? I think it is correct to treat those points as null-sets and move on.

Also a user who ADs a non-differentiable function should just be aware of what they are doing and deal with the consequences.

jrevels · June 12, 2017, 9:43pm

Relevant issues:

There’s probably some other relevant discussions floating about the various JuliaDiff repos. Deciding what to do at (or near) nondifferentiable points can be tricky for the reasons everybody has listed.

This is actually something I and @dpsanders have discussed before. He’s made some pretty cool ValidatedNumerics demos by combining AD and interval arithmetic.

dpsanders · June 12, 2017, 9:51pm

What is a good reference for subgradients?

dpo · June 12, 2017, 9:55pm

Any convex analysis book, e.g., Rockafellar’s.

dpsanders · June 12, 2017, 11:34pm

Thanks. Who would like to implement them?

dpsanders · June 12, 2017, 11:40pm

What about a freely-available reference?

dpo · June 13, 2017, 2:15am

There’s https://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf. Strangely enough, they don’t talk much about subgradients in their book.

Topic		Replies	Views
What functions might break ForwardDiff.jl? General Usage forwarddiff	10	2225	August 11, 2021
Automatic differentiation of function with special points General Usage forwarddiff , autodiff	4	608	February 1, 2023
How is the max operator dealt with in DualNumbers.jl? Numerics differentiation	4	884	September 6, 2017
ForwardDiff & "complicated" functions? General Usage	14	2303	November 16, 2019
Autodiff removing singularity Data forwarddiff , autodiff	5	283	January 31, 2024

Should ForwardDiff.jl issue warnings at non-differentiable points?

Related topics