Optimization on Stiefel manifold with auto-differentiation

Yue-Zhengyuan · September 3, 2025, 11:30am

I want to minimize the energy E of some quantum state, which is a real-valued function ultimately depending on a square real orthogonal matrix X (i.e. matrices in the Stiefel manifold). The (Euclidean) gradients of the energy with elements in X needs to be calculated by auto-differentiation.

From what I know, optimization on the Stiefel manifold can be done using the Manopt.jl package, or OptimKit.jl with TensorKitManifolds.jl. The auto-differentiation part can be handled by Zygote.jl or Enzyme.jl. Unfortunately, I have zero experience on any of them, and it’s a bit hard for me to follow the documentation (in particular, OptimKit doesn’t have docs yet).

If someone can help provide me with a minimal example on such a problem, using a relatively simple target function f(X) to be optimized to demonstrate the usage of these packages, it would be of great help of me. Thanks!

Appendix: The exact expression of E(X) that I’m trying to optimize is

E = \sum_{k,\sigma} \xi_k \rho_{k\sigma} + 2 \sum_k \operatorname{Re}(\Delta_k \eta_k)

where k sums over the momenta in the 1st Brillouin zone of a square lattice with periodic or anti-periodic boundary condition, and

\begin{align} \xi_k &= -2t (\cos k_x + \cos k_y) - \mu \\ \Delta_k &= 2\Delta (\cos k_x - \cos k_y) \\ \sum_k \xi_k \rho_{k\uparrow} &= \frac{1}{2} \sum_k \xi_k [1 - (G_k)_{12}] \\ \sum_k \xi_k \rho_{k\downarrow} &= \frac{1}{2} \sum_k \xi_k [1 - (G_k)_{34}] \\ \eta_k &= -\frac{1}{4}[ (G_k)_{41} + (G_k)_{32} + i (G_k)_{42} - i(G_k)_{31} ] \end{align}

Here t, \Delta, \mu are 3 fixed parameters. The matrix G_k is related to the orthogonal matrix X in the following way:

G_k = A + B(D + G_\omega(k))^{-1} B^\mathsf{T}

G_\omega(k) = \left[\bigoplus_{i=1}^{\chi} \begin{bmatrix} & -e^{-ik_x} \sigma_x \\ e^{ik_x} \sigma_x & \end{bmatrix}\right] \oplus \left[\bigoplus_{i=1}^{\chi} \begin{bmatrix} & -e^{-ik_y} \sigma_x \\ e^{ik_y} \sigma_x & \end{bmatrix}\right]

Here \sigma_x is the x Pauli matrix, and \chi is a positive integer (which is not very big). The matrices A, B, D are given by

\begin{bmatrix} A & B \\ -B^\mathsf{T} & D \end{bmatrix} = X^\mathsf{T} \left( \bigoplus_{i=1}^{4\chi+2} \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} \right) X

The size of the matrix X is then (8\chi + 4) \times (8\chi + 4). You can see that it is really not feasible not to use auto-differentiation…

kellertuer · September 3, 2025, 12:34pm

Without some concrete code it is a bit hard to help. But if you compute the gradient with Zygote or Enzyme, the thing missing is, that this is the Euclidean and not the Riemannian gradient, see the full explanation e.g. at

While I am not so sure that ManifoldDiff.jl can directly work with Zygote or Enzyme (I think it should), you can also riemannian_gradient directly for a conversion.

Your appendix is nice for context but nothing that I easily could implement nor test with anything. It is too complex for that - and misses concrete data.
So I can not actually help with code here.

stevengj · September 3, 2025, 3:54pm

You could also just optimize f(X) = f(Y (Y^T Y)^{-1/2}) over unconstrained real square matrices Y.

The polar decomposition X = Y (Y^T Y)^{-1/2} automatically satisfies X^T X = I, and the function Y / sqrt(Hermitian(Y'Y)) should be differentiable with standard AD packages (e.g. ChainRules.jl has a rule for the symmetric-matrix square root).

(This is a generalization of a simple trick to optimize on the unit sphere: Optimization on unit sphere? - #3 by stevengj).

Yue-Zhengyuan · September 4, 2025, 12:00pm

@stevengj This is a very useful trick. Thanks! I’ll try this if I encounter issues with Manopt.

@kellertuer The code is in my GitHub repo

The energy function can be constructed from

using GaussianfPEPS

# set energy parameters
# for "d-wave" state, Δx = Δy = -Δ
t, Δx, Δy, mu = 1.0, 0.5, -0.5, -0.6
# create the Brillouin zone with anti-PBC on x-direction, and PBC on y-direction
# for a square lattice with 100 x 100 sites
bz = BrillouinZone((100, 100), (false, true))
# Np is a constant, which is often just set to 2
# (corresponding to spin-1/2 fermions)
Np = 2
# energy function
f(X) = BCS.energy_peps(fiducial_cormat(X), bz, Np; t, Δx, Δy, mu)

which aims to be the Julia equivalent of

which was written in Python using PyManopt. The AD was handled with JAX there. It seems that PyManopt has some way to convert the Euclidean derivatives produced by JAX to a manifold derivative. I tried to be compatible with it, but it is using a convention different from even its own accompanying paper, so I gave up figuring it out.

Yue-Zhengyuan · September 5, 2025, 12:50pm

I encounter some problems due to sparse matrices in my code. For now I just apply @stevengj’s trick so I don’t need Manopt to get the derivatives.

using Zygote, Random, LinearAlgebra, GaussianfPEPS

t, Δx, Δy, mu = 1.0, 0.5, -0.5, -0.6
bz = BrillouinZone((10, 10), (false, true))
Np = 2

function myfun(Y::AbstractMatrix)
    X = Y / sqrt(Hermitian(Y' * Y))
    G = fiducial_cormat(X)
    return BCS.energy_peps(G, bz, Np; t, Δx, Δy, mu)
end

χ = 1
N = Np + 4χ
Random.seed!(0)
x = rand(2N, 2N);
g = Zygote.gradient(myfun, x)

Then I get error

Need an adjoint for constructor SparseArrays.SparseMatrixCSC{ComplexF64, Int64}. Gradient is of type Matrix{ComplexF64}

Does this mean that I need to define an rrule for the sparse matrices I created?

stevengj · September 5, 2025, 1:07pm

Not quite, as explained here: Zygote.jl: How to get the gradient of sparse matrix - #6 by stevengj — if you are going to write an rrule, it typically has to be for the function that includes both the sparse-matrix construction and how you use the sparse matrix. (This isn’t usually so hard, though.)

Where are the sparse matrices in your problem description above?

(You might also try Enzyme.jl.)

Yue-Zhengyuan · September 6, 2025, 1:19am

The sparse matrices are G_\omega(k) (called cormat_virtual in my code) and J = \bigoplus_{i=1}^{4\chi+2} \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} (its construction function in my code is get_J).

I found out that Enzyme supports array mutations. So I can avoid using sparse matrices if Enzyme also has some trouble in handling them.

Yue-Zhengyuan · September 6, 2025, 4:32am

Now I no longer construct J explicitly, but G_\omega(k) is still a sparse matrix. When using Enzyme, although sparse matrix seems to cause no trouble, it still complains “Enzyme compilation failed due to illegal type analysis.”

I take the derivative using the “convenient” function gradient

g = Enzyme.gradient(Reverse, myfun, x)

The error message (which is quite long) begins with

Enzyme compilation failed due to illegal type analysis.
 This usually indicates the use of a Union type, which is not fully supported with Enzyme.API.strictAliasing set to true [the default].
 Ideally, remove the union (which will also make your code faster), or try setting Enzyme.API.strictAliasing!(false) before any autodiff call.
 To toggle more information for debugging (needed for bug reports), set Enzyme.Compiler.VERBOSE_ERRORS[] = true (default false)
 Failure within method: MethodInstance for getproperty(::QRPivoted{ComplexF64, Matrix{ComplexF64}, Vector{ComplexF64}, Vector{Int64}}, ::Symbol)
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erroneous code with Cthulhu.jl
Caused by:
Stacktrace:
 [1] triu!
   @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:441
 [2] getproperty
   @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/qr.jl:497

But I’m not aware of any union types in my code relevant to energy_peps.

juthohaegeman · September 6, 2025, 9:22pm

Hi Yue,

if you want to use the OptimKit en TensorKitManifolds combination, what you can do is:


using OptimKit
using TensorKit, TensorKitManifolds

X0 = TensorMap(...) # your initial orthogonal matrix

function fg(X)
    fval, pb = Zygote.pullback(myfun, X)
    grad = pb(1.)
    proj_grad = TensorKitManifolds.Stiefel.project!(grad, X)
    return fval, proj_grad
end

myscale!(Δ::StiefelTangent, α::Real) = rmul!(Δ, α)
myadd!(Δy::StiefelTangent, Δx::StiefelTangent, α::Real) = axpy!(α, Δx, Δy)

optimize(fg, X0, LBFGS(); retract = Stiefel.retract, inner = Stiefel.inner, transport = Stiefel.transport, scale! = myscale!, add! = myadd!)

If you have additional optimization variables besides the orthogonal matrix X, you also want to write a custom retract, inner and transport, that simply does the above for X and its gradient contribution, and does the normal vector equivalents for the other parameters.

I am happy to give you some more detailed explanation and help setting things up. Feel free to contact me via email.

TensorKitManifolds.jl is overdue for an update; i.e. the lines defining myscale! and myadd! should not be necessary, but result from the fact that OptimKit.jl switched to VectorInterface.jl for its default values, whereas TensorKitManifolds has not yet implemented this interface.

Yue-Zhengyuan · September 8, 2025, 2:07am

Thanks Professor! The sample code is quite useful. Now the AD part can run without problem, but there are still issues in the last step with optimize. I think it’s easier to format things here than in emails.

First, to get Zygote working, I have to construct G_\omega(k) without any use of the SparseArray package, and store it as a dense matrix (these changes are in my ad-opt branch). The code I end up using is:

using Random
using LinearAlgebra
using GaussianfPEPS
using GaussianfPEPS: rand_orth
using Zygote
using TensorKit, OptimKit, TensorKitManifolds
import TensorKitManifolds.Stiefel: StiefelTangent

# d-wave BCS Hamiltonian parameters
t, Δx, Δy, mu = 1.0, 0.5, -0.5, 0.0
# number of physical (complex) fermions 
# and virtual (complex) fermions along each lattice direction
Np, χ = 2, 1
N = Np + 4χ
Random.seed!(0)
# create a random real orthogonal matrix
x = TensorMap(rand_orth(2N), ℂ^(2N) → ℂ^(2N));
# start with a small BZ for quick tests
Lx, Ly = 10, 10
bz = BrillouinZone((Lx, Ly), (false, true));

function myfun(X)
    # extract matrix elements from the TensorMap
    G = fiducial_cormat(reshape(X.data, (2N, 2N)))
    return BCS.energy_peps(G, bz, Np; t, Δx, Δy, mu)
end

function fg(X)
    fval, pb = Zygote.pullback(myfun, X)
    grad = pb(1.)[1] # sample code doesn't have this [1]
    proj_grad = TensorKitManifolds.Stiefel.project!(grad, X)
    return fval, proj_grad
end

myscale!(Δ::StiefelTangent, α::Real) = rmul!(Δ, α)
myadd!(Δy::StiefelTangent, Δx::StiefelTangent, α::Real) = axpy!(α, Δx, Δy)

result = optimize(
    fg, x, LBFGS(); retract = Stiefel.retract, inner = Stiefel.inner, 
    transport = Stiefel.transport, scale! = myscale!, add! = myadd!
)

The function fg can run successfully for a given input TensorMap X. But when calling optimize, I get the error (I’m using OptimKit v0.4.0):

MethodError: no method matching optimize(::typeof(fg), ::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, ::LBFGS{Float64, HagerZhangLineSearch{Rational{Int64}}}; retract::typeof(retract), inner::typeof(TensorKitManifolds.inner), transport::typeof(transport), scale!::typeof(myscale!), add!::typeof(myadd!))
This error has been manually thrown, explicitly, so the method may exist but be intentionally marked as unimplemented.

Closest candidates are:
  optimize(::Any, ::Any, ::LBFGS; precondition, finalize!, shouldstop, hasconverged, retract, inner, transport!, scale!, add!, isometrictransport) got unsupported keyword argument "transport"
   @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/lbfgs.jl:56
  optimize(::Any, ::Any, !Matched::ConjugateGradient; precondition, finalize!, shouldstop, hasconverged, retract, inner, transport!, scale!, add!, isometrictransport) got unsupported keyword argument "transport"
   @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/cg.jl:68
  optimize(::Any, ::Any, !Matched::GradientDescent; precondition, finalize!, shouldstop, hasconverged, retract, inner, transport!, scale!, add!, isometrictransport) got unsupported keyword argument "transport"
   @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/gd.jl:51

Removing the transport argument doesn’t solve the problem; the error message become

ArgumentError: arrays must have the same axes for copy! (consider using `copyto!`)

Stacktrace:
  [1] copy!
    @ ./abstractarray.jl:920 [inlined]
  [2] stiefelexp(W::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, A::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, Z::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, α::Float64)
    @ TensorKitManifolds.Stiefel ~/.julia/packages/TensorKitManifolds/D6miB/src/stiefel.jl:178
  [3] retract_exp(W::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, Δ::StiefelTangent{TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}}, α::Float64)
    @ TensorKitManifolds.Stiefel ~/.julia/packages/TensorKitManifolds/D6miB/src/stiefel.jl:188
  [4] #retract#4
    @ ~/.julia/packages/TensorKitManifolds/D6miB/src/stiefel.jl:104 [inlined]
  [5] retract
    @ ~/.julia/packages/TensorKitManifolds/D6miB/src/stiefel.jl:102 [inlined]
  [6] takestep(iter::OptimKit.HagerZhangLineSearchIterator{Float64, typeof(fg), typeof(retract), typeof(TensorKitManifolds.inner), TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, StiefelTangent{TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}}, Rational{Int64}}, α::Float64)
    @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/linesearches.jl:279
  [7] iterate(iter::OptimKit.HagerZhangLineSearchIterator{Float64, typeof(fg), typeof(retract), typeof(TensorKitManifolds.inner), TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, StiefelTangent{TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}}, Rational{Int64}})
    @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/linesearches.jl:180
  [8] (::HagerZhangLineSearch{Rational{Int64}})(fg::typeof(fg), x₀::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, η₀::StiefelTangent{TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}}, fg₀::Tuple{Float64, StiefelTangent{TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}}}; retract::Function, inner::typeof(TensorKitManifolds.inner), initialguess::Float64, acceptfirst::Bool, maxiter::Int64, maxfg::Int64, verbosity::Int64)
    @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/linesearches.jl:131
  [9] optimize(fg::typeof(fg), x::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}}, alg::LBFGS{Float64, HagerZhangLineSearch{Rational{Int64}}}; precondition::typeof(OptimKit._precondition), finalize!::typeof(OptimKit._finalize!), shouldstop::OptimKit.DefaultShouldStop, hasconverged::OptimKit.DefaultHasConverged{Float64}, retract::Function, inner::typeof(TensorKitManifolds.inner), transport!::typeof(OptimKit._transport!), scale!::typeof(myscale!), add!::typeof(myadd!), isometrictransport::Bool)
    @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/lbfgs.jl:107
 [10] top-level scope
    @ ~/GitHub/Playground_Julia/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X13sZmlsZQ==.jl:1

juthohaegeman · September 8, 2025, 6:35am

I think I was missing an exclamation mark in transport!, can you try the following:

optimize(
    fg, x, LBFGS(); retract = Stiefel.retract, inner = Stiefel.inner, 
    transport! = Stiefel.transport!, scale! = myscale!, add! = myadd!
)

Yue-Zhengyuan · September 8, 2025, 7:54am

Using transport! with ! don’t work neither. I get the same ArgumentError: arrays must have the same axes for copy!.

juthohaegeman · September 8, 2025, 11:54am

Are you sure? For me this works, except for the fact that now there is AD / Zygote error with something with SparseMatrixCSC

julia> result = optimize(
           fg, x, LBFGS(); retract = Stiefel.retract, inner = Stiefel.inner, 
           transport! = Stiefel.transport!, scale! = myscale!, add! = myadd!
       )
ERROR: Need an adjoint for constructor SparseArrays.SparseMatrixCSC{ComplexF64, Int64}. Gradient is of type Matrix{ComplexF64}
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] (::Zygote.Jnew{SparseArrays.SparseMatrixCSC{ComplexF64, Int64}, Nothing, false})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/lib/lib.jl:336
  [3] (::Zygote.var"#2230#back#327"{Zygote.Jnew{SparseArrays.SparseMatrixCSC{ComplexF64, Int64}, Nothing, false}})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
  [4] SparseMatrixCSC
    @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/SparseArrays/src/sparsematrix.jl:31 [inlined]
  [5] (::Zygote.Pullback{Tuple{Type{SparseArrays.SparseMatrixCSC{ComplexF64, Int64}}, Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{ComplexF64}}, Any})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [6] SparseMatrixCSC
    @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/SparseArrays/src/sparsematrix.jl:44 [inlined]
  [7] (::Zygote.Pullback{Tuple{Type{SparseArrays.SparseMatrixCSC}, Int64, Int64, Vector{Int64}, Vector{Int64}, Vector{ComplexF64}}, Any})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [8] _blockdiag
    @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/SparseArrays/src/sparsematrix.jl:4030 [inlined]
  [9] (::Zygote.Pullback{Tuple{typeof(SparseArrays._blockdiag), Type{…}, Type{…}, SparseArrays.SparseMatrixCSC{…}, SparseArrays.SparseMatrixCSC{…}}, Any})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [10] #305
    @ ~/.julia/packages/Zygote/55SqB/src/lib/lib.jl:214 [inlined]
 [11] #2189#back
    @ ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72 [inlined]
 [12] blockdiag
    @ ~/.julia/juliaup/julia-1.11.6+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/SparseArrays/src/sparsematrix.jl:3995 [inlined]
 [13] #305
    @ ~/.julia/packages/Zygote/55SqB/src/lib/lib.jl:214 [inlined]
 [14] #2189#back
    @ ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72 [inlined]
 [15] cormat_virtual
    @ ~/.julia/packages/GaussianfPEPS/uf5Tj/src/cormat.jl:20 [inlined]
 [16] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Matrix{ComplexF64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [17] #18
    @ ~/.julia/packages/GaussianfPEPS/uf5Tj/src/models/bcs.jl:123 [inlined]
 [18] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [19] #679
    @ ~/.julia/packages/Zygote/55SqB/src/lib/array.jl:202 [inlined]
 [20] #4
    @ ./generator.jl:37 [inlined]
 [21] iterate
    @ ./generator.jl:48 [inlined]
 [22] collect(itr::Base.Generator{Base.Iterators.Zip{Tuple{Matrix{Tuple{Float64, Zygote.Pullback{Tuple{…}, Tuple{…}}}}, Matrix{Float64}}}, Base.var"#4#5"{Zygote.var"#679#684"}})
    @ Base ./array.jl:791
 [23] map
    @ ./abstractarray.jl:3502 [inlined]
 [24] (::Zygote.var"#map_back#681"{GaussianfPEPS.BCS.var"#18#19"{…}, 1, Tuple{…}, Tuple{…}, Matrix{…}})(Δ::Matrix{Float64})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/lib/array.jl:202
 [25] #2865#back
    @ ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72 [inlined]
 [26] #energy_peps#17
    @ ~/.julia/packages/GaussianfPEPS/uf5Tj/src/models/bcs.jl:121 [inlined]
 [27] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [28] energy_peps
    @ ~/.julia/packages/GaussianfPEPS/uf5Tj/src/models/bcs.jl:115 [inlined]
 [29] (::Zygote.Pullback{Tuple{typeof(Core.kwcall), @NamedTuple{…}, typeof(GaussianfPEPS.BCS.energy_peps), Matrix{…}, BrillouinZone{…}, Int64}, Any})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [30] myfun
    @ ./REPL[17]:4 [inlined]
 [31] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [32] (::Zygote.var"#88#89"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:97
 [33] fg(X::TensorMap{Float64, ComplexSpace, 1, 1, Vector{Float64}})
    @ Main ./REPL[18]:3
 [34] optimize(fg::typeof(fg), x::TensorMap{…}, alg::LBFGS{…}; precondition::typeof(OptimKit._precondition), finalize!::typeof(OptimKit._finalize!), shouldstop::OptimKit.DefaultShouldStop, hasconverged::OptimKit.DefaultHasConverged{…}, retract::Function, inner::typeof(TensorKitManifolds.inner), transport!::typeof(transport!), scale!::typeof(myscale!), add!::typeof(myadd!), isometrictransport::Bool)
    @ OptimKit ~/.julia/packages/OptimKit/G6i79/src/lbfgs.jl:66
 [35] top-level scope
    @ REPL[21]:1
Some type information was truncated. Use `show(err)` to see complete types.

Yue-Zhengyuan · September 8, 2025, 12:46pm

I have removed usage of sparse matrices in energy_peps in the ad-opt branch of my repo. I guess you may be using the main branch. Could you please try again with the ad-opt branch?

juthohaegeman · September 8, 2025, 3:06pm

Ok, the error messages should definitely be improved, but the following works. I got sidetracked by your mentioning of Stiefel above, but of course your variables currently are just an orthogonal/unitary matrix:


using Random
using LinearAlgebra
using GaussianfPEPS
using GaussianfPEPS: rand_orth
using Zygote
using TensorKit, OptimKit, TensorKitManifolds
import TensorKitManifolds.Unitary: UnitaryTangent

# d-wave BCS Hamiltonian parameters
t, Δx, Δy, mu = 1.0, 0.5, -0.5, 0.0
# number of physical (complex) fermions 
# and virtual (complex) fermions along each lattice direction
Np, χ = 2, 1
N = Np + 4χ
Random.seed!(0)
# create a random real orthogonal matrix
x = TensorMap(rand_orth(2N), ℂ^(2N) → ℂ^(2N));
# start with a small BZ for quick tests
Lx, Ly = 10, 10
bz = BrillouinZone((Lx, Ly), (false, true));

function myfun(X)
    # extract matrix elements from the TensorMap
    G = fiducial_cormat(reshape(X.data, (2N, 2N)))
    return BCS.energy_peps(G, bz, Np; t, Δx, Δy, mu)
end

function fg(X)
    fval, pb = Zygote.pullback(myfun, X)
    grad = pb(1.)[1] # sample code doesn't have this [1]
    proj_grad = TensorKitManifolds.Unitary.project!(grad, X)
    return fval, proj_grad
end

myscale!(Δ::UnitaryTangent, α::Real) = rmul!(Δ, α)
myadd!(Δy::UnitaryTangent, Δx::UnitaryTangent, α::Real) = axpy!(α, Δx, Δy)

result = optimize(
    fg, x, LBFGS(); retract = Unitary.retract, inner = Unitary.inner, 
    transport! = Unitary.transport!, scale! = myscale!, add! = myadd!
)

Topic		Replies	Views
Optim.jl with autodiff and complex numbers Optimization (Mathematical) question , optim , complex-numbers	8	1526	August 18, 2019
Optim.jl OnceDifferentiable Baby Steps Optimization (Mathematical)	19	4025	April 8, 2018
Optim: What optimiser is best if your gradient computation is slow? Optimization (Mathematical) optim	13	3200	August 22, 2017
Optimizing matrix-valued parameters of an ODE General Usage diffeq , flux , zygote	1	408	May 20, 2021
optim arguments: higher dimensional arrays and gradient specification Optimization (Mathematical) question , optim , optimization	10	2342	January 26, 2018

Optimization on Stiefel manifold with auto-differentiation

Related topics