[ANN] DifferentiationInterface - gradients for everyone

gdalle · April 30, 2024, 8:54am

Julia’s composability has rather interesting consequences for its automatic differentiation ecosystem. Whereas Python programmers first choose a backend (like PyTorch or JAX) and then write code that is specifically tailored to it, Julians first write their code and then make it differentiable for one or more of the many available backends (like ForwardDiff.jl or Zygote.jl). Forward and reverse mode, numeric and symbolic, Julia has it all. But it’s not always obvious which option is best suited for a given application.

We hold these truths to be self-evident that all autodiff backends are created equal, and that Julia users should be able to quickly try them out without diving into a dozen different APIs to figure out correct syntax and performance tricks. @hill and I are thus proud to present DifferentiationInterface.jl, an efficient and flexible interface to every differentiation method you could possibly dream of.

Basics

With DifferentiationInterface, an elementary gradient computation looks like this:

using DifferentiationInterface
import ForwardDiff

backend = AutoForwardDiff()

f(x) = sum(abs2, x)
x = rand(10)

g = gradient(f, backend, x)

Backend objects are defined by ADTypes.jl and we support a dozen of them, from golden oldies ForwardDiff.jl and Zygote.jl to new fan favorites like Enzyme.jl… and even experimental ones like FastDifferentiation.jl and Tapir.jl!

We provide operators for first-order and second-order differentiation: derivative, gradient, jacobian, second_derivative, hessian, as well as the lower-level pushforward, pullback and hvp. These operators support only numbers or arrays as inputs and outputs. They use nearly optimal backend-specific implementations whenever possible, and reasonable fallbacks the rest of the time.

Check out the tutorial for a simple example, the overview for a detailed walkthrough, and the list of backends for a taste of the endless possibilities.

Performance

The Julia community loves optimizing for maximum performance, and would never give up speed for convenience. That is why DifferentiationInterface was designed from the ground up with type stability and memory efficiency in mind. Users can very easily:

choose between function signatures f(x) = y and f!(y, x)
perform differentiation out-of-place (allocating a new gradient) or in-place (mutating the gradient)
prepare a config / tape / cache / etc. when they plan several differentiation calls in a row

Advanced

Among the fancier features offered by DifferentiationInterface, let us mention:

sparse Jacobians and Hessians with arbitrary backends (a generic and clean reimplementation of SparseDiffTools.jl, albeit still partial)
flexible backend combinations for second order, with efficient Hessian-vector products
DifferentiateWith(f, backend), which defines a chain rule to make Zygote.jl use another backend (like Enzyme.jl) for part of the reverse-pass

Testing and benchmarking

The companion package DifferentiationInterfaceTest.jl allows you to quickly compare autodiff backends, on pre-defined or custom scenarios. It includes:

correctness tests, either against a ground truth or against a reference backend
type stability tests
detailed benchmarks
scenarios with weird array types (JLArrays.jl for GPU, StaticArrays.jl, ComponentArrays.jl)

We think this can be of great use for package developers to test which backends support their functions, and how fast differentiation can be. It has also enabled us to help diagnose and fix bugs in several autodiff packages.
Once again, the tutorial is a great place to start.

What about AbstractDifferentiation?

AbstractDifferentiation.jl was our main inspiration for this work, and we learned a lot of lessons from its design. To alleviate some of its current limitations (notably around mutation and caching), we imposed certain restrictions on our code. DifferentiationInterface accepts only one input x and one output y, and guarantees support for number or array types but nothing beyond that. Given the existence of ComponentArrays.jl, these rules seem fairly mild, and they made our life considerably easier for robust implementation and testing.

Still, some use cases may require differentiation with respect to multiple inputs or non-array types. This level of generality has always been the goal of AbstractDifferentiation, and so it will remain.
After discussing the matter with @mohamed82008 to avoid XKCD #927, the way forward looks like this:

DifferentiationInterface will keep its narrow focus, corresponding to the intersection of all autodiff packages in terms of functionalities
AbstractDifferentiation will wrap DifferentiationInterface for the simple cases, and aim for the union of all autodiff packages when it comes to handling funky inputs

Roadmap

Now that the foundations are solid, here are some things we would like to work on next:

help downstream users adopt our package, with Optimization.jl as a first target (thanks to @Vaibhavdixit02) and larger ecosystems like SciML and Turing in our sights (ping @ChrisRackauckas)
support chunked / batched pushforwards and pullbacks, to reduce the number of function calls necessary for Jacobian and Hessian computations
increase performance of second order operators, which are not yet fully optimized
improve sparsity handling, especially with respect to structured matrices
transfer the package to JuliaDiff once its design has finally stabilized

Interested?

DifferentiationInterface is not yet completely stable, but it follows semantic versioning, and the latest v0.3 release is very well tested.
We encourage you to try it out in your own projects, and open an issue if you find yourself struggling.
Oh, and of course, a star on the GitHub repo would be very much appreciated

gdalle · September 25, 2024, 9:08pm

Necroposting to announce a major breaking release: DifferentiationInterface v0.6 brings support for additional constant arguments (not differentiated), as well as a bunch of performance improvements. You can now write code like

f(x, c) = sum(x) * c
gradient(f, backend, x, Constant(c))

See the release notes and the new documentation.

hexaeder · October 8, 2024, 8:18am

Hi! Cool new features I’m trying out the Constants, but it looks like it only supports constants after the input to derive for?

I have an ODE rhs f(du, u, p, t) and trying to find the parameter jacobian

pjac = jacobian(f, du, AutoForwardDiff(), Constant(u0), p0, Constant(t))

but I get a method error which suggest that this isn’t a valid “Context”. Is this a limitation or am I doing something wrong?

ERROR: MethodError: no method matching jacobian(::Network{…}, ::Vector{…}, ::AutoReverseDiff{…}, ::Constant{…}, ::Vector{…}, ::Constant{…})
The function `jacobian` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  jacobian(::F, ::Any, ::ADTypes.AbstractADType, ::Any, ::Context...) where {F, C}
   @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/gPcSq/src/fallbacks/no_prep.jl:71
  jacobian(::Any, ::DifferentiationInterface.NoJacobianPrep, ::AutoReverseDiff, ::Any, ::Context...) where C
   @ DifferentiationInterfaceReverseDiffExt ~/.julia/packages/DifferentiationInterface/gPcSq/ext/DifferentiationInterfaceReverseDiffExt/onearg.jl:220
  jacobian(::F, ::DifferentiationInterface.JacobianPrep, ::ADTypes.AbstractADType, ::Any, ::Context...) where {F, C}
   @ DifferentiationInterface ~/.julia/packages/DifferentiationInterface/gPcSq/src/first_order/jacobian.jl:154
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[59]:1
 [2] top-level scope
   @ none:1
Some type information was truncated. Use `show(err)` to see complete types.

gdalle · October 8, 2024, 8:23am

Indeed it remains a limitation at the moment, because it makes my life 100x easier: there can be only one differentiated input (commonly called x) and it must be given as the first argument to the function. This is specified in the following paragraph of the docs, but I should probably make it clearer.

In a simpler case than yours, e.g. with a function

f(const1, x, const2)

you could just define

f_shuffled(x, const1, const2) = f(const1, x, const2)

and pass f_shuffled to the jacobian operator.

However, in your case, I assume that du is mutated by f? This means that Constant is not the right type of context for such an argument: the right type would be called Cache and I have not implemented it yet. You can keep track of this upcoming feature here:

hexaeder · October 8, 2024, 8:34am

You are correct, my function changes du. However I think the argument shuffling would still work, because the mutated argument du stays at the front. So

f_shuffeld(du, p, const_u, const_t) = f(du, const_u, p, const_t)

should be still a valid “mutating” function with additional constant context. Even without introducing a new Cache context.
I’ll try to evaluate that. Unfortunately, my f is a callable struct, so I need to capture it in a closure rather than statically defining f_shuffled.

Thanks for the super fast reply and creating of that tracking issue

gdalle · October 8, 2024, 10:19am

In that case yes, it should work! Let me know how it goes

Topic		Replies	Views
State of AD in 2024 General Usage machine-learning , autodiff	4	2304	April 6, 2024
Automatic differentiation - Julia implementation advantages Machine Learning	8	4090	June 7, 2019
Second-order autodiff: which combinations should work? Optimization (Mathematical) zygote , forwarddiff , autodiff , enzyme , diffractor	13	892	May 30, 2024
AD pipeline and Hessian–vector products Optimization (Mathematical) autodiff	12	742	July 1, 2024
Taking gradients in Julia General Usage question , zygote , forwarddiff , reversediff	7	2130	September 28, 2021