Aggresive garbage collection behavior with DataFrames in 1.10?

I’m seeing some unexpected behavior with garbage collection in 1.10
while working with large DataFrames in the REPL. If I run this
(simplified example) code in 1.10.4:

using DataFrames                                                                                                                   
ncols = 100000;                                                                                                                    
nrows = 10000;                                                                                                                     
m = rand(Float32,ncols,nrows);                                                                                                     
df1 = DataFrame(m,:auto);                                                                                     

I get:

julia> GC.enable_logging(true)                                                                                                     
                                                                                                                                   
julia> using DataFrames                                                                                                            
                                                                                                                                   
GC: pause 82.10ms. collected 34.092128MB. incr                                                                                     
                                                                                                                                   
julia> ncols = 100000;                                                                                                             
                                                                                                                                   
julia> nrows = 10000;                                                                                                              
                                                                                                                                   
julia> m = rand(Float32,ncols,nrows);                                                                                              
                                                                                                                                   
GC: pause 39.51ms. collected 14.510637MB. incr                                                                                     
                                                                                                                                   
julia> df1 = DataFrame(m,:auto);                                                                                                   
                                                                                                                                   
GC: pause 26.28ms. collected 5.710283MB. incr                                                                                      
                                                                                                                                   
GC: pause 14.41ms. collected 0.020363MB. full                                                                                      
                                                                                                                                   
GC: pause 116.95ms. collected 0.056976MB. full                                                                                     
                                                                                                                                   
GC: pause 117.43ms. collected 0.000000MB. full                                                                                     
                                                                                                                                   
GC: pause 117.91ms. collected 0.000000MB. full                                                                                     
                                                                                                                                   
GC: pause 118.38ms. collected 0.000000MB. full                                                                                     
                                                                                                                                   
GC: pause 118.75ms. collected 0.000000MB. full                                                                                     
                                                                                                                                   
GC: pause 119.83ms. collected 0.000000MB. full                                                                                     

The machine has 384GB of memory and the julia process is using around
10GB at most, so I’m not sure why it’s garbage collecting so
aggressively. I definitely did not see this kind of behavior in 1.9 on
the same code (only a couple of very quick incremental calls).

Has anyone seen behavior like this? It’s causing roughly a 5x slowdown
running my code in 1.10 vs 1.9. I’ve experimented with --heap-hint-size without success.

1 Like