Speeding up per-sample gradients?

yolhan_mannes · August 21, 2025, 8:32pm

Looks cool do python have à wrapper ? First thing to note is that tmva doesn’t seem to optimise gradient calculations only inference.
ROOT: TMVA tutorials seems to use it on a bunch of examples.
Ps : I think Lux can generate ONNX files so it would be duable maybe it can do Julia → mlir → ONNX (using jax)
However you would get similar perf as doing a Jax Vs tmva since Reactant uses the same cool features

cortner · October 6, 2025, 9:24pm

We use Lux.jl but not Reactant. Reading through this thread it seems we should differentiate via “batched reverse mode” in Enzyme. But this is not documented. I’d be grateful for a really simple script that shows how this can be used.

Lux.jl seems to have batched_jacobian, which I believe does the right thing? But it uses Zygote and not Enzyme. How can that be when Zygote only allows differentiation of scalar outputs?

yolhan_mannes · October 6, 2025, 9:58pm

Do you have a simple function to differentiate that would be somewhat similar ie same size in same size out as your original problem and explain how you want to differentiate it. This thread was more focus on speed but if you don’t use Reactant it may not be your focus? Also no Idea why you think zygote only allow for scalar output

cortner · October 6, 2025, 11:59pm

I hand-code most pullbacks, for the purpose of speed. So that is indeed my focus. But If you think it would be better to open a separate thread? That said, we may have figured out now how to do this in Enzyme - at least for toy models, so will just go ahead and test how well this works in our actual research codes.

My impression is that this is not documented in Enzyme. If I’m wrong can you point me to the right section?

wsmoses · October 7, 2025, 4:17am

API reference · Enzyme.jl , chunk in API reference · Enzyme.jl

cortner · October 7, 2025, 5:14pm

Note, I started posting in this thread since it seems relevant but I’m happy to start a new post if I’m unnecessarily polluting this discussion.

wsmoses – Thank you for the pointer. FWIW, we had actually found that but simply couldn’t decipher how to use it. Maybe this requires a deeper understanding of Enzyme than most users would have or want to have. (I’d be more than happy to contribute to the documentation, but I don’t really feel qualified given I seem to not understand at all what Enzyme does here.)

EDIT: deleted the example I had here, this was a red herring, but it is helping me realize I had a fundamental misunderstanding of the issues involved. If I’m right then there is no difficulty really, but otherwise I’ll come back here or post elsewhere.

Topic		Replies	Views
`Zygote.gradient` is 54000 TIMES slower than `jax.gradient` Optimization (Mathematical) zygote , jax	80	1855	February 1, 2025
Any faster way of computing small gradients? Performance zygote , forwarddiff , symbolics , autodiff	21	2119	August 11, 2022
Speeding up gradient of logpdf Machine Learning question , performance , autodiff	19	870	February 12, 2024
Zygote much slower than JAX for automatic differentiation of energy Machine Learning performance , flux , zygote , jax , lux	22	1712	May 15, 2024
Speed of Nested AD in Enzyme General Usage autodiff , enzyme	8	395	November 30, 2024

Speeding up per-sample gradients?

Related topics