What functions might break ForwardDiff.jl?

Hi, are there some julia functions that might cause problems for ForwardDiff.jl? I’m using ForwardDiff.jl to take the derivative of a simulation output. For some reason the output differs slightly in some parts of the curve compared to taking the derivative numerically. I have good reasons to believe that the numerical differentiation is accurate.

The simulation contains mostly basic matrix operations (+, -, .*, ) but I’m thinking

rand() .* vector_that_depends_on_input

or

argmin(vector_that_depends_on_input)

might be causing the problems. Are there any other functions that I should be suspicious of?

You will normally get a MethodError before wrong results, if you call some function it cannot handle. Broadcasting ought to be fine.

But there can always be bugs, if you can boil it down then it may be possible to figure out.

If “taking the derivative numerically” means finite-differences, then you may also want to investigate using something like FiniteDifferences.jl, which should get better accuracy than a naiive subtraction.

2 Likes

Okay, I’ll try to figure this out and get a minimum working example to demonstrate the behavior.

As for FiniteDifferences.jl, I have already tried it with not great results, but I’ll give it another shot just in case I was doing something very wrong.

Thank you!

Do you have any manual branches or anything else where you’ve written code that won’t literally make an error, but is wrong because a derivative doesn’t actually exist, perhaps at a specific point where you happen to actually call the function? Here’s an obviously contrived example that nobody would ever do, but it does explain the idea:

using ForwardDiff, FiniteDifferences
badabs(x) = abs(x) < 1e-12 ? zero(x) : abs(x)
ForwardDiff.derivative(badabs, 1e-12) # 1.0
central_fdm(3, 1)(myabs, 1e-12) # 0.0002105

You probably aren’t doing exactly that, but once or twice I’ve caught bugs where for some reason I wrote functions that had a non-differentiable kink or something, but ForwardDiff doesn’t literally break and so I was able to execute the code and literally did get numbers back. In this case obviously ForwardDiff gives the “right” answer, but it’s just an example of when the two methods for computing derivatives might disagree.

No idea if that’s really the problem, but if you’re scrolling around your code looking for potential issues, maybe something like that is worth keeping an eye out for.

Yes. Another source of bugs is if your code, or code you call, switches to a simpler implementation after testing say x[1] == 0, or issymmetric(A). The simpler x[1]==0 branch may be correct for real x, but fail to keep track of derivatives.

1 Like

I’m quite sure that my code should not do any of that but I could be mistaken.

The basic control flow of my code looks like:

function simulation(x, param, steps)

    state = initial_state(param)::Vector #Linear algebra, doesn't depend on x

    accumulator = 0
    time = 0

    for i in 1:steps
        #These are plain linear algebra
        possible_new_states = new_states(x, state)::Matrix # columns correspond to possible new states
        times_to_new_event = rand() .* times_to_new_event(x, state)::Vector

        new_event_index = argmin(time_to_new_event)

        dt = times_to_new_event[new_event_index]
        state = possible_new_states[:, new_event_index]

        #updating time and accumulator
        time = time + dt
        accumulator = accumulator + new_value(state)
    end

    return accumulator / time
end

I would tend to recommand using Richardson extrapolation instead. That way it will automatically (adaptively) extrapolate down to zero step size with as high an order as possible while avoiding catastrophic cancellation, and should be a much more reliable way of getting a high-accuracy finite-difference rule. See FiniteDifferences.jl: Richardson extrapolation

I am not sure whether this is relevant in your case, but I recently ran into the following problem: When the input is an integer, and somewhere down the line there is a conversion to float, it will be a Float16 instead of a Float64 (reported here: `ForwardDiff.derivative(float, 1)` returns `Float16(1.0)` · Issue #362 · JuliaDiff/ForwardDiff.jl · GitHub).

Compare:

julia> using ForwardDiff

julia> ForwardDiff.derivative(float, 1)
Float16(1.0)

julia> ForwardDiff.derivative(float, 1.0)
1.0

I ran into this when trying to write my own convenience wrapper for the derivative of a complex function via the Jacobian based on a suggestion on this discourse, and then wondering why seemingly simpler test inputs like 1 + 1im were giving much less accurate results than those involving e.g. π.

The simplest “real-world” case I found where this becomes a problem was when using the function sincos (notice the problem won’t appear if using the normal sin function):

using ForwardDiff

julia> mysin(x) = sincos(x)[1]
mysin (generic function with 1 method)

julia> ForwardDiff.derivative(mysin, 1)
Float16(0.5405)

julia> ForwardDiff.derivative(mysin, 1.0)
0.5403023058681398

My workaround for the time being was then to just simply convert every input to Float64 beforehand.

1 Like

Thank you for the suggestions!

I’ll try using Richardson extrapolation, I hope it can deal better with somewhat noisy data.

I do indeed pass a few parameters as Vector{Int64}. I will fix those and see if it helps.

Many of the matrices are also “special” types such as Tridiagonal or SparseArray, so maybe they could cause problems. The dimensions aren’t that large so the performance hit should not be big. Also as a last resort, many of the broadcasts could be converted to loops, but that shouldn’t make a difference.

I’m now fairly certain that the promotion to Float16 is not the issue. Additionally I also tested multiple different implementations for finding the minimum (argmin() in the code posted above) and changing the differentiation rules in DiffRules.jl to handle min(), max(), and abs() differently or to return NaN on non-differentiable cases. None of this changed the program output. A simpler test program however showed that the diff rules did have expected effects.

It might still be that I have missed some differentiation rules that could cause the problem. I will keep on searching, but at some point I will have to give up on using AD here.

I have figured out my problem, but it might not be solvable at all. The accumulator in my code is a function of a part that is directly a function of the input to be differentiated, and a part that is a function of the simulation state. The simulation starts with some initial state and the evolution is discrete (there is only a finite number of states that a state can evolve to). As far as I understand this should make the function technically discontinous and thus making the function not differentiable analytically.