I would like to use Optim.jl with the target function defined on a complex Stiefel manifold (unitary matrices) of some large size.
Stiefel manifold is already implemented in Optim.jl. The problem is I can’t use autodiff due to complex numbers, and optimization performs badly without the gradient.
On the other hand unitary matrices can be implemented as arrays of real numbers (of double size), so autodiff will work. But in this case I will miss the unitary constrains (so I’ll have to reimplement them, and I’m not sure how).

Yes, unfortunately there’s currently no good way to do complex AD in Julia. Hopefully a solution based on ChainRules.jl will be ready soon. Out of interest: You’re probably differentiating a complex norm, so are non-holomorphic functions a concern, or is this not a problem in your case?

Cool! I’m the one that added complex and Stiefel for Optim, so glad it’s useful! (out of curiosity what’s your application?) Definitely use complex optim, but hand code your gradient. If you want to use autodiff, you’ll want to use reverse diff (forward is about as expensive as finite differences). Eg Zygote supports complex->real gradients (see https://github.com/FluxML/Zygote.jl/issues/29), but there are currently no packages for reverse diff that are mature enough if your objective function is even moderately complicated.

Yeah it works in low dimensions, which is cool In large dimensions the time complexity is just too big (without the gradient). I use it for numerical experiments related to Zauner’s conjecture in quantum science.

Thanks for tips, I now realize forward diff won’t help anyway.
Are there any other automatic ways to compute the gradient of a Julia code? Or by hand is the best option right now?

Hand-coded gradients are always best if you can manage. Otherwise you can always try your hand at reverse diff. There are a few competing packages (Zygote, Tracker, Nabla, Yota are the names that seem to recur). You can also use a hybrid approach, where you use autodiff for specific parts of the code (eg if you have nasty scalar functions you can just take the forward diff) but drive the computation yourself, or let autodiff take the driver’s seat and provide adjoints for subfunctions that it can’t AD (trickier, should hopefully be helped by ChainRules & friends at some point)