Should ForwardDiff.jl issue warnings at non-differentiable points?

question

#1

The derivative of abs is discontinuous at zero. Currently ForwardDiff chooses the +1.0 derivative. Should this choice be done without a warning?

In formal math, a new concept called the subgradient is defined to include both +1.0 and -1.0 and the whole [-1.0,1.0] interval. There is no inherent justification of choosing just the +1.0 point from the interval.
Significant math follows from knowing and considering other values as representatives of the subgradient interval.

A possible set of warnings could be implemented with:

julia> using ForwardDiff

julia> @inline Base.:<(d::ForwardDiff.Dual{T,V,N} where T where V<:Real where N,x::AbstractFloat) = ( ForwardDiff.value(d)==x && warn("Subgradient is not a singleton. Forced to pick single value") ; ForwardDiff.value(d)<x )

julia> ForwardDiff.Dual(10.0,-1.0) < 10.0     # This can be both true and false
WARNING: Subgradient is not a singleton. Forced to pick single value
false

julia> @inline Base.abs(d::ForwardDiff.Dual) = ( ForwardDiff.value(d)==zero(typeof(d)) && warn("Subgradient is not a singleton. Forced to pick single value") ; signbit(ForwardDiff.value(d)) ? -d : d )

julia> abs(ForwardDiff.Dual(0.0,1.0))
WARNING: Subgradient is not a singleton. Forced to pick single value
Dual{Void}(0.0,1.0)

julia> @inline Base.:<(d::ForwardDiff.Dual{T,V,N} where T where V<:Real where N,x::W where W<:Integer) = ( ForwardDiff.value(d)==x && warn("Subgradient is not a singleton. Forced to pick single value") ; ForwardDiff.value(d)<x )

julia> ForwardDiff.Dual(0.0,-1.0)<0
WARNING: Subgradient is not a singleton. Forced to pick single value
false

julia> signbit(ForwardDiff.Dual(0.0,-1.0))
WARNING: Subgradient is not a singleton. Forced to pick single value
false

What are the forum’s views on this?


#2

I suspect issuing warnings like this would kill the performance of the code. If benchmarking bears this out (be sure to check a case where the differentiated code vectorizes), I would be opposed to make this change.


#3

AFAIK this (= if you are trying to differentiate non-differentiable functions, be prepared to face the consequences) is well understood, but hard to deal with.

  1. Ideally, one would not AD non-differentiable functions.
  2. But sometimes this happens, and then one would hope for an almost zero-measure set of nondifferentiable points, so we can quietly ignore this (eg with MCMC).
  3. Then the worst-case is some iterative procedure where your algorithm just loves hanging out at the nondifferentiable set.

Whether you want warnings depends on the use case. For 2, you would get the occasional one, for 3, your screen would be flooded with them, and indicate that you are using the wrong algorithm.


#4

@Tamas_Papp, this is a good summary. But there are two points which made me put this out:

  1. The deterministic symmetry breaking to one arbitrary and extreme derivative is bothersome - feels like missing out on the dimension of the possible results.
  2. Julia is a good language to extend this AD facility to generate subgradients and calculate with them (like interval arithmetic). This route could potentially lead to multivariate subgradients i.e. cones and operations on them, which could make expressing optimization problems on polytopes nicely in native Julia (and have them run decently).

#5

Ideally the user would be able to specify which direction to take the limit from or whether to have a warning. (Whether that’s practical here, I have no idea.)

Leaving this warning on by default seems like a recipe for a lot of pain and frustration for someone.


#6

Possibly, but the solutions I can imagine would be very heavyweight and/or sacrifice type stability. If you have a good solution, make a PR or a proof of concept.


#7

If you have floating point accuracy, how do you know that you are at 0.0 or, say, 1e-32? I think it is correct to treat those points as null-sets and move on.

Also a user who ADs a non-differentiable function should just be aware of what they are doing and deal with the consequences.


#8

Relevant issues:


There’s probably some other relevant discussions floating about the various JuliaDiff repos. Deciding what to do at (or near) nondifferentiable points can be tricky for the reasons everybody has listed.

This is actually something I and @dpsanders have discussed before. He’s made some pretty cool ValidatedNumerics demos by combining AD and interval arithmetic.


#9

What is a good reference for subgradients?


#10

Any convex analysis book, e.g., Rockafellar’s.


#11

Thanks. Who would like to implement them?


#12

What about a freely-available reference?


#13

There’s https://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf. Strangely enough, they don’t talk much about subgradients in their book.