Zygote much slower than JAX for automatic differentiation of energy

If we’re looking at the allocations themselves, I don’t observe the same as you.
On my laptop, it’s 4 Mb rather than 4 Gb.
Can you share the versions of Julia and the packages that you are using?

As a side note, allocations of 4 Gb would be coherent with a dense version of the sparse H matrix:

julia> @benchmark ones(Float32, 2^15, 2^15)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.505 s …    1.775 s  ┊ GC (min … max): 0.02% … 14.93%
 Time  (median):     1.647 s               ┊ GC (median):    6.40%
 Time  (mean ± σ):   1.643 s ± 133.053 ms  ┊ GC (mean ± σ):  7.24% ±  7.88%

  █         █                                      █       █  
  █▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
  1.5 s          Histogram: frequency by time         1.78 s <

 Memory estimate: 4.00 GiB, allocs estimate: 2.

I wonder if this is related to a naive chain rule which allocates the full matrix during a pullback

Maybe @mohamed82008 will have an opinion.