State of automatic differentiation in Julia

I agree. I’m all about AD, especially in Julia. Maybe you missed what I said at the beginning of the post. It can be a useful option for checking derivatives (and the effort isn’t bad, about the same as ensuring AD compatibility).

I should add that I rarely use complex step anymore, particularly because the AD support in Julia is just so good. I was just noting that we sometimes run into scenarios where complex step has been an effective choice and that some of the apparent limitations aren’t too bad.

Edit: a couple examples we’ve run into besides derivative checking

  • a subroutine that involved complex numbers but the input dimension was always small (3 components of velocity). Multicomplex step worked well here (although I believe there is now AD support for complex numbers?)
  • Large Julia code where a subpiece uses a legacy code. This legacy code is large enough that a rewrite in Julia would be quite an investment. AD was good but not as comprehensive whereas complex step was simple.

But these are rare cases so I agree overall that in Julia, especially pure Julia, complex step has limited practical value.

2 Likes

It is just single partial dual numbers with primitives only defined on analytic functions though. It is precisely a form of forward-mode AD, just limited. That’s actually how I prefer to introduce AD these ways: https://youtu.be/zHPXGBiTM5A just extending complex step AD.

That’s such a misnomer. With forward-mode AD and complex step differentiation you have the same computational complexity increase since you make your number have N dimensions instead of calling the function N times, so you’re essentially doing the equivalent to N calls in the compiler. In both cases you only do the primal calculation f(x) one time too, so that’s not even a gain. The only gain is that by putting the calls together you tend to get more SIMD. If you don’t do SIMD, it’s just about accuracy.

Reverse-mode AD is a completely different story since it’s a completely different calculation.

2 Likes

Yes I like to introduce it that way also sometimes.

I totally agree on computational complexity. It’s the same. That’s not what I was referring to. I meant that a forward or central difference generally won’t have the same accuracy as complex step because of subtractive cancellation and that one way to avoid that in finite difference might be to use a larger stencil but that requires more function calls.

Yes, it is less convenient version of dual numbers. Coming from other languages (c/c++/Fortran), I didn’t think of dual numbers.

Mainly, I was speaking from a user’s point of view in verifying AD (whether it is Forward or Reverse Diff) in their application code accurately. As a user, I have two ways of verifying accurately with less effort: Finite difference and Complex step (as there are good complex math libraries).

Now consider a multi-physics CFD simulation, appropriate scaling of all the equations is not obvious (at least, at first). Alleviating numerical errors in finite difference of the system of discretized equations with 100 million inputs and 100 million outputs to get a Jacobian is non-trivial. So, it’s accuracy suffers. That is, as I reduce the step length to get more accurate result, I will quickly hit a point where accumulated round-off errors are too large. Whereas, complex step doesn’t suffer from round-off errors, hence preferred. While there are ways to improve accuracy of finite difference, they are usually more costly or require more function calls.

I should have been more clearer with what I meant by

In the context of CFD, typically number of input variables are either equal or more than the number of output variables. So, reverse AD is used. In which case, complex step is costlier.

As @andrewning mentioned, the accuracy of finite difference is less than complex step for same step length. In computing large Jacobians (upwards of 100x100 million), finite difference is not always enough to give reliable results. Reducing step length hits limit of round-off errors while not reducing step length leaves us with less accurate Jacobian. Fixing it’s accuracy increases costs.

I would like to append a question: what is the most suitable package for constructing “analytical” first derivatives and storing these as a function for later use and direct evaluation? For example in order to run a Jacobian matrix inside a for loop for the respective state vector.
Thanks a lot

3 Likes

ModelingToolKit is powerful.
You can either construct models with the API, or by tracing regular Julia functions (modelingtoolkitize). It can detect sparsity, and produce analytical Jacobians.
It’s geared toward differential equations, but you may be able to adapt it to your purposes, but if you have a Jacobain for the change of state from one iteration to the next, it sounds like it’s exactly what you’re looking for.

3 Likes

Differential equations are one system type that have special handling right now, and it is quite extensive, but we also have NonlinearSystem, OptimizationSystem, ControlSystem, etc. and it’s a general symbolic system. We’re working on getting a ton of stuff in the differential equation portions first, but things like analytical sparse Hessians of Julia code for nonlinear optimization work today on optimization problems.

3 Likes

Do we currently have some kind of wrapper package for automatic differentiation, geared towards end-users and towards packages that implement algorithms that need AD (so that they don’t need to include code for each individual AD-framework)?

I mean something that wraps ForwardDiff, Zygote, FiniteDiff, etc. (e.g. via Requires or backend-specific “adapter” packages) and provides a common API for standard use cases like calculating gradients and left/right Jacobian-vector multiplication?

I might also be very nice to have a standard way to decorate functions with the AD-algorithms that the user (who knows the function) judges appropriate - so that optimization, MCMC, etc. algorithms can pick that up automatically. Kinda like

with_ad(myfunc, ForwardDiffAD(batchsize = 8))

meaning “use ForwardDiff” for everything or

with_ad(myfunc, ZygoteAD())

meaning “use Zygote” for everything or

with_ad(myfunc, ForwardDiffAD(), ZygoteAD())

meaning use ForwardDiff for JVP and use Zygote for VJP and gradients of real-valued functions.

I started https://github.com/JuliaDiff/AbstractDifferentiation.jl but didn’t have too much time to complete the PR yet. This is exactly the purpose of this package. Feel free to read the PR and ask questions if you would like to accelerate its merging.

9 Likes

Awesome @mohamed82008 , that’s what I was looking for. I guess the idea would be encourage AD packages to implement this API, and possible provide Requires.jl-based adaptors for the major ones until they do?

No need for Requires. The plan is to depend on this package in all the AD packages and define its API.

1 Like

That would be perfect! :slight_smile:

has this kind of functionality for gradients, at the moment ForwardDiff and Zygote are supported.

1 Like

tpapp/LogDensityProblems.jl

Oh, sure! I was also looking for JVP and VJP functionality and diff in the non-log domain, so something more general. I should have mentioned LogDensityProblems.jl in my post, though.

For anyone reading this today, the JuliaDiff.org website has been updated with much more details and a more complete list

PR’s welcome to help keep it up to date
Can make PRs at
https://github.com/JuliaDiff/juliadiff.github.io

10 Likes