# Comparison of Automatic Differentiation (AD)?

Hello! Has there been a recent study (within the past year) that compares the speed/performance of difference Automatic Differentiation packages (Zygote, ReverseDiff, etc.)?

2 Likes

Related posts:

And this paper.

1 Like

Has there been a recent study (within the past year) that compares the speed/performance of difference Automatic Differentiation packages (Zygote, ReverseDiff, etc.)?

The answer is going to be â€śit depends.â€ť Different packages/algorithms have their strengths and weaknesses. What is your use-case?

Iâ€™m trying to fit some parameters within a model to data.

Right now, itâ€™s in the thousands, but I estimate that this model can go in the tens of thousands or hundreds of thousands.

With that many parameters you might perhaps resort to gradient free methods. Perhaps a Gibbs sampling strategy would be efficient where the sampler can evaluate one parameter at a time while keeping the other constant. You might be able to factor out your likelihood in many small different parts (each depending on just a few parameters for your Gibbs evaluation), that will speed evaluation of your posterior up a lot (e.g. by not evaluating parts you do not need to in Gibbs).

Iâ€™m trying to fit some parameters within a model to data.

How? What is the model?

Depending on the properties of the model youâ€™re trying to fit, gradient based methods work fine for 10^4 or 10^5 parameters. See, e.g., JuMP on a nonlinear programming problem from optimal power flow with 10^5+ variables and constraints: Benchmarking Nonlinear Optimization with AC Optimal Power Flow | Carleton Coffrin | JuliaCon 2022 - YouTube.

2 Likes

We need to know a bit more details about the OPâ€™s model and objective function to find suitable strategies. E.g. is factorization of the likelihood into smaller parts impossible, then Gibbs (or slice) sampling do not provide gains (in evaluating the joint posterior) over other more sophisticated sampling strategies (gradient or gradient free). Perhaps the optimization/estimation problem is difficult to express in terms of likelihood (and prior)?

1 Like