How to debug slow memory leak from threads?

I’m running a calculation about 10 million times with different parameter sets. The input data is 50MB and the output of all 10 million runs adds up to 1GB.

When I use multiple threads with 8 cores, I get a nearly perfect 8x speed up. However, I also get a slow memory leak. In the 1 hour that it takes to run, the Julia process’s resident memory gradually grows to 11GB.

After the calculation is done, GC.gc() only recovers about 100MB. varinfo shows the 50MB input data and the 1GB result data, but no sign of the remaining 10GB.

Is there any way to further debug how the memory is getting lost? Producing a minimal example probably won’t be easy.

julia> versioninfo()                                       
Julia Version 1.4.0-DEV.527                                
Commit 6d26f14ede (2019-11-26 02:01 UTC)                   
Platform Info:                                               
OS: Linux (x86_64-pc-linux-gnu)                            
CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz               
WORD_SIZE: 64                                              
LIBM: libopenlibm                                          
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)                    
Environment:                                                 
JULIA_NUM_THREADS = 8