If we’re looking at the allocations themselves, I don’t observe the same as you. Can you share the versions of Julia and the packages that you are using?
On my laptop, it’s 4 Mb rather than 4 Gb.
As a side note, allocations of 4 Gb would be coherent with a dense version of the sparse H
matrix:
julia> @benchmark ones(Float32, 2^15, 2^15)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
Range (min … max): 1.505 s … 1.775 s ┊ GC (min … max): 0.02% … 14.93%
Time (median): 1.647 s ┊ GC (median): 6.40%
Time (mean ± σ): 1.643 s ± 133.053 ms ┊ GC (mean ± σ): 7.24% ± 7.88%
█ █ █ █
█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁█ ▁
1.5 s Histogram: frequency by time 1.78 s <
Memory estimate: 4.00 GiB, allocs estimate: 2.
I wonder if this is related to a naive chain rule which allocates the full matrix during a pullback
Maybe @mohamed82008 will have an opinion.