Cholesky Decomposition combined with ForwardDiff.jl

CeterisPartybus · March 13, 2025, 1:29am

Short version: I would like to use automatic differentiation on my user defined function that includes a Cholesky decomposition. However, I get the following error:

ERROR: Function iszero requires primal value(s).
A dual-number tracer for local sparsity detection can be used via `TracerLocalSparsityDetector`.

A MWE for the error replication is here:

using ForwardDiff

a = rand(SparseConnectivityTracer.GradientTracer{SparseConnectivityTracer.IndexSetGradientPattern{Int64, BitSet}}, (4, 4))

cholesky!(Hermitian(a))

Long version: I want to solve a constrained optimization problem with ADNLPModels.jl. My objective function includes (but is not limited to) calculating b .= inv(X'X)X'Y. I try to minimize dynamic allocations implementing advice from @stevengj here Memory allocation, left-division operator - #3 by stevengj to preallocate c.cache and c.coefs beforehand and get my solution dynamic allocation free with:

ldiv!(cholesky!(Hermitian(mul!(c.cache, X', X))), mul!(c.coefs, X', Y))

However, this produces the above mentioned error because of the Cholesky decomposition.

Any advice is highly appreciated!

Thanks a lot!

stevengj · March 13, 2025, 2:46am

Are you sure you want to use ForwardDiff.jl? If you are solving an optimization problem, you presumably have a scalar objective function. In that case, you should strongly consider using reverse mode, not forward mode.

Forward mode is efficient to differentiate many outputs with respect to few inputs. Reverse mode is efficient to differentiate few outputs with respect to many inputs (e.g. to compute a gradient).

Unless you are sure that X is well-conditioned, it is safer to use X \ Y to solve a least-square problem (via a QR algorithm), not the normal equations (which squares the condition number and hence doubles the number of digits you lose to roundoff error). This also explained in the thread you linked.

I think that ChainRules.jl has reverse rules implemented for these solves, e.g. for use with Zygote.jl.

If you want to squeeze out maximum efficiency, of course, you could implement reverse-mode/adjoint-method/backpropagation differentiation manually. (It’s not that hard if you know a little matrix calculus.) Especially if you want to pre-allocate all your arrays and work in-place. Especially if you want to exploit sparsity. But I would get something working first before trying to optimize it.

CeterisPartybus · March 13, 2025, 4:09am

Hi Steven,

I knew I could count on you Thanks for your answer.

I only use ForwardDiff.jl in the MWE explicitly to replicate the error. My actual application is a constrained optimization with thousands of constraints implemented with NLPModelsKnitro.jl. However, I am neither set on using forward or reverse mode nor using automatic differentiation in at all. I believe finite differences could also do the job. It is just that NLPModelsKnitro.jl (or ADNLPModels.jl) is simple to use and happens to use AD in forward model by default. My main goal here is to reduce my dynamic allocations because I run into performance issues when increasing the size of the problem.

Thanks. I believe X should be well-conditioned. And I never ran into issues with it in my use-case when running simulations. I wouldn’t mind using the QR algorithm if there was a way to implement it without dynamic allocations. Do you know how to implement that with Dual types?

I will check out your recommendations for ChainRules.jl and Zygote.jl (or writing it myself if everything else fails).

Thanks a lot!

gdalle · March 13, 2025, 7:10am

I agree with the suggestion to use reverse mode autodiff for the gradient (and forward-over-reverse for the Hessian), but this error isn’t actually related to the autodiff step, it comes from the sparsity detection step with SparseConnectivityTracer.jl.
Is there a point in your actual code in which you select a sparsity_detector, typically inside an ADTypes.AutoSparse structure? If so, try replacing SparseConnectivityTracer.TracerSparsityDetector() (probably the default in ADNLPModels.jl) with SparseConnectivityTracer.TracerLocalSparsityDetector(). See this section of the docs to know more.
Once you’ve done that, you’ll probably run into another error which is due to the cache not having the right element type. We’re working on solving that too, but it will be easier to handle once ADNLPModels.jl switches to DifferentiationInterface.jl.
Ping @hill @amontoison

gdalle · March 13, 2025, 7:18pm

Edit: this should be solved now

amontoison · March 16, 2025, 6:25pm

@CeterisPartybus
I just added a section in the documentation of ADNLPModels.jl to explain how to use a TracerLocalSparsityDetector():

CeterisPartybus · March 16, 2025, 7:11pm

That’s awesome! Thanks

Topic		Replies	Views
Zygote adjoints for cholesky factorization on sparse arrays Machine Learning	0	268	May 9, 2023
Find gradient of a posterior Statistics question	12	999	May 16, 2019
Autodiff a partial cholesky decomposition Specific Domains reversediff , autodiff	1	397	September 24, 2021
Sparse matrix error with forward diff General Usage forwarddiff , sparse	16	875	March 29, 2024
Automatic Differentiation Slow (Slower than Finite-Differences) Optimization (Mathematical)	18	3465	May 31, 2018

Cholesky Decomposition combined with ForwardDiff.jl

Related topics