Zygote much slower than JAX for automatic differentiation of energy

gdalle · May 14, 2024, 1:20pm

You’re definitely right that we would need to apply this improvement on both sides.
However in some cases a well-written code can be much easier to differentiate for Zygote than a badly written one, so it’s not necessarily zero-sum.

By the way, I edited my code above with an even faster version, yielding x10 speedup on the energy.

Silly me, I forgot to run the actual gradient computation… I now observe the same allocations as you, and my x10 faster energy function is actually… slower to differentiate. Very frustrating indeed.

julia> @benchmark compute_energy_and_gradient($model, $ps, $st, $H, $all_configurations)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.593 s …    1.842 s  ┊ GC (min … max):  0.03% … 10.68%
 Time  (median):     1.827 s               ┊ GC (median):    10.77%
 Time  (mean ± σ):   1.754 s ± 139.300 ms  ┊ GC (mean ± σ):   8.00% ±  6.64%

  █                                                     █  █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁█ ▁
  1.59 s         Histogram: frequency by time         1.84 s <

 Memory estimate: 4.01 GiB, allocs estimate: 82.

julia> @benchmark compute_energy_and_gradient($model_fast, $ps_fast, $st_fast, $H, $all_configurations)
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.782 s …   1.919 s  ┊ GC (min … max):  0.16% … 11.39%
 Time  (median):     1.840 s              ┊ GC (median):    11.87%
 Time  (mean ± σ):   1.847 s ± 68.330 ms  ┊ GC (mean ± σ):   8.09% ±  6.77%

  █                       █                               █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.78 s         Histogram: frequency by time        1.92 s <

 Memory estimate: 4.04 GiB, allocs estimate: 324.

Topic		Replies	Views
[Optimization] How would you speed the RNN Flux / Zygote code up? Specific Domains knet , flux , optimization , machine-learning , zygote	2	593	July 14, 2020
Lux (And Flux), "parallel" Network Input. When Input is flat, Zygote gradient works, when input is not flat it doesn't Machine Learning flux , zygote , lux	10	676	February 5, 2024
Lux, ComponentArrays and flat parameters : computing the gradient works with Zygote but not with Enzyme New to Julia enzyme	16	1681	May 14, 2024
Zygote Performance Machine Learning question	22	4977	September 23, 2019
Errors when trying to compute hessian of flux neural net and jacobian of jacobian with zygote Machine Learning flux , zygote	1	375	July 4, 2022

Zygote much slower than JAX for automatic differentiation of energy

Related topics