I am trying to optimize the performance of training a neural network model (using Flux.jl and Mill.jl packages). I figured out that significant time is being spent on garbage collection (when @timing individual parts of the script, I often see GC time around 80-90%. As a proof-of-concept, I have overriden Array constructors by static allocation (in the spirit of Alloc.jl, but without using IRTools) which led to 10-15x speed-up when computing gradients.
Specifically, I am currently profiling deserialization of minibatches, where minibatch consists of several wide and deep tree structures. I noticed that when I run the deserialization in a relatively fresh Julia, the time required for deserialization is roughly constant:
julia> for i in 1:20
mb = @time deserialize("mb.jls")
end
1.010540 seconds (4.67 M allocations: 593.807 MiB, 12.95% gc time)
0.987930 seconds (4.67 M allocations: 593.807 MiB, 10.46% gc time)
1.057710 seconds (4.67 M allocations: 593.807 MiB, 16.85% gc time)
1.188085 seconds (4.67 M allocations: 593.807 MiB, 25.05% gc time)
1.280059 seconds (4.67 M allocations: 593.807 MiB, 29.30% gc time)
1.220317 seconds (4.67 M allocations: 593.807 MiB, 27.63% gc time)
While when I load additional data in memory, the garbage collection seems to be much more aggressive which have negative implications on the runtime:
julia> for i in 1:20
mb = @time deserialize("mb.jls")
end
6.243606 seconds (4.67 M allocations: 593.807 MiB, 85.56% gc time)
0.920360 seconds (4.67 M allocations: 593.807 MiB)
5.957967 seconds (4.67 M allocations: 593.807 MiB, 84.44% gc time)
0.949225 seconds (4.67 M allocations: 593.807 MiB)
5.960878 seconds (4.67 M allocations: 593.807 MiB, 84.32% gc time)
0.949110 seconds (4.67 M allocations: 593.807 MiB)
It seems that in the second case, the garbage collection time is significantly higher then the time needed to read and construct all the datastructures. Note that in both cases, there is plenty of free memory in the system (> 200GB) so the garbage collection should not be necessary. The code has been run in Julia 1.3.1.
Could you please give me some hints how to debug and avoid such situations? Also, are there some options to tweak garbage collection (e.g., similar to -Xmx
and -Xms
parameters) that would make garbage collection less aggressive? (In Details about Julia's Garbage Collector, Reference Counting? the answer was negative, but I am not sure if something has not been changed in the meanwhile.)
Thank you in advance!