Comparison of Automatic Differentiation (AD)?

bdas123 · December 12, 2022, 8:20am

Hello! Has there been a recent study (within the past year) that compares the speed/performance of difference Automatic Differentiation packages (Zygote, ReverseDiff, etc.)?

nrontsis · December 12, 2022, 11:11am

Related posts:

And this paper.

odow · December 12, 2022, 6:45pm

Has there been a recent study (within the past year) that compares the speed/performance of difference Automatic Differentiation packages (Zygote, ReverseDiff, etc.)?

The answer is going to be “it depends.” Different packages/algorithms have their strengths and weaknesses. What is your use-case?

bdas123 · December 14, 2022, 5:06am

I’m trying to fit some parameters within a model to data.

Right now, it’s in the thousands, but I estimate that this model can go in the tens of thousands or hundreds of thousands.

JesperMartinsson · December 14, 2022, 7:03am

With that many parameters you might perhaps resort to gradient free methods. Perhaps a Gibbs sampling strategy would be efficient where the sampler can evaluate one parameter at a time while keeping the other constant. You might be able to factor out your likelihood in many small different parts (each depending on just a few parameters for your Gibbs evaluation), that will speed evaluation of your posterior up a lot (e.g. by not evaluating parts you do not need to in Gibbs).

odow · December 14, 2022, 7:54am

I’m trying to fit some parameters within a model to data.

How? What is the model?

Depending on the properties of the model you’re trying to fit, gradient based methods work fine for 10^4 or 10^5 parameters. See, e.g., JuMP on a nonlinear programming problem from optimal power flow with 10^5+ variables and constraints: https://youtu.be/tvBNQcuU-hY?t=997.

JesperMartinsson · December 14, 2022, 11:05pm

We need to know a bit more details about the OP’s model and objective function to find suitable strategies. E.g. is factorization of the likelihood into smaller parts impossible, then Gibbs (or slice) sampling do not provide gains (in evaluating the joint posterior) over other more sophisticated sampling strategies (gradient or gradient free). Perhaps the optimization/estimation problem is difficult to express in terms of likelihood (and prior)?

Topic		Replies	Views
State of AD in 2024 General Usage machine-learning , autodiff	4	2239	April 6, 2024
Taking gradients in Julia General Usage question , zygote , forwarddiff , reversediff	7	2123	September 28, 2021
[ANN] DifferentiationInterface - gradients for everyone Package Announcements zygote , forwarddiff , ad , autodiff , enzyme	5	1374	October 8, 2024
Open discussion on the state of differentiable physics in Julia Community sciml , ad , dp	9	6860	August 15, 2022
Comparison of Julia autodiff packages General Usage autodiff	1	196	September 27, 2024

Comparison of Automatic Differentiation (AD)?

Related topics