Memory leak from threading but not pmap

I have a function that takes a set of parameters, runs a simulation, and returns 5 values in a named tuple. It does no IO and only calls rand() with a seeded RNG that is explicitly passed to hopefully avoid any unsafe threading.

If I run Julia (nightly on linux skylake Xeon) with 4 threads, it will work well for a small number of iterations, but then the memory usage will gradually creep up. Each iteration takes 1-2 seconds, so with the normal GC noise, I expect the memory usage to go up and down by 200MB every few seconds.

For example, in the first few minutes, Julia will use 400-600MB, but by 10 minutes, it’s over 1GB, and by an hour, it’s at 30GB (but still with the 200MB variations).

When I run the same code with pmap and 4 workers, each worker stays between 250-310MB with no leakage even after hours.

Any ideas on how to debug what’s going on?

2 Likes

I assume you are using a separate RNG object per thread?

2 Likes

Can you post some code/MWE. I have access to a server with a decent amount of RAM that I can test and try to reproduce.

3 Likes

Yes, I just pass an Int seed as one.of.the parameters and then the RNG is initialized by the function being run in parallel.

Thanks, I’ll give it a try but probably not. Unfortunately the simulation that the function runs has thousands of lines of proprietary code.

Ok, I managed to find a very minimal example that eats up 5-10GB of memory within a few seconds:

Threads.@threads for i in 1:100000
    sum(collect(1:10^6))
end

I’m not passing any options to Julia, just setting JULIA_NUM_THREADS=4.

Filed an issue here:

Meaning you’re explicitly creating a MersenneTwister object (or pcg, whatever your preferred and is), and all your calls torand look like rand(mt)?

Edit: although that’s obviously not the problem in your sum(collect(1:N)) example.