I’m seeing some unexpected behavior with garbage collection in 1.10
while working with large DataFrames in the REPL. If I run this
(simplified example) code in 1.10.4:
using DataFrames
ncols = 100000;
nrows = 10000;
m = rand(Float32,ncols,nrows);
df1 = DataFrame(m,:auto);
I get:
julia> GC.enable_logging(true)
julia> using DataFrames
GC: pause 82.10ms. collected 34.092128MB. incr
julia> ncols = 100000;
julia> nrows = 10000;
julia> m = rand(Float32,ncols,nrows);
GC: pause 39.51ms. collected 14.510637MB. incr
julia> df1 = DataFrame(m,:auto);
GC: pause 26.28ms. collected 5.710283MB. incr
GC: pause 14.41ms. collected 0.020363MB. full
GC: pause 116.95ms. collected 0.056976MB. full
GC: pause 117.43ms. collected 0.000000MB. full
GC: pause 117.91ms. collected 0.000000MB. full
GC: pause 118.38ms. collected 0.000000MB. full
GC: pause 118.75ms. collected 0.000000MB. full
GC: pause 119.83ms. collected 0.000000MB. full
The machine has 384GB of memory and the julia process is using around
10GB at most, so I’m not sure why it’s garbage collecting so
aggressively. I definitely did not see this kind of behavior in 1.9 on
the same code (only a couple of very quick incremental calls).
Has anyone seen behavior like this? It’s causing roughly a 5x slowdown
running my code in 1.10 vs 1.9. I’ve experimented with --heap-hint-size without success.