for loop consumes extreme amount of memory even though operation can be considered "pure"

This was originally opened as a github issue #28909, though I was directed here for the question.

The behaviour is that, even though the loop itself is pure(except for printing), and that results can be safely discarded after execution, Julia still consumes large amount of memory, seeming to keep the entire array being looped over in memory. Ideally, the used items should be freed.

This is the code I used. The goal was to exhaustively search for specific pattern in the entire range of 32-bit random number:

using Random
# dice rolling
function ndx!(rng, sp, N)
    L = 0
    for n in 1:N
        L += rand(rng, sp)
    end
    L
end
#generate a sequence
function gensum!(seed)
    rng = MersenneTwister(seed)
    d6 = Random.Sampler(rng, 1:6)
    ndx!(rng,d6,21)
end

l = Threads.SpinLock()
Threads.@threads for i in 0x00000000:0xFFFFFFFF
    s = gensum!(i)
    if (i | 0xFFF00000 == 0xFFF00000)
        lock(l)
        if (i | 0xFF000000 == 0xFF000000)
            if (i | 0xF0000000 == 0xF0000000)
                print(";")
            end
            print(",")
        end
        print(".")
        unlock(l)
    end
    if (s == 126)
        lock(l)
        println("Found Legendary Character, seed ",i)
        unlock(l)
    end
end

I did not shrink my example because it cost me 3 hours to run under 12 threads, and I did not want to rerun the program. During the beginning, memory usage rapidly climbs up to 15G rapidly, and stays steady throughout the majority of the computation. However, when the computation nears the end (I presume the last minute or so), the memory usage quickly doubles to 30G.

I cannot explain this behaviour since I am unfamiliar with internal implementation of Julia, but I strongly suspect that it is caused by GC. I did not use alternative loops due to parallel computation, and attempting a map operation using filter(condition,map(f, 0x00000000:0xFFFFFFFF)) immediately throws an out of memory exception(possibly due to the lack of lazy evaluation).

@rfourquet has suggested that GC might not be fast enough to clean up the MersenneTwister objects, and provided a workaround which I have not tested yet, and is posted below:

const RNGs = [MersenneTwister(0) for i in 1:Threads.nthreads()]

function gensum!(seed)
    rng = Random.seed!(RNGs[Threads.threadid()], seed)
    d6 = Random.Sampler(rng, 1:6)
    ndx!(rng,d6,21)
end

I do not know whether GC have trouble removing the MT object, as memory usage increases only at the beginning (to 15G), and the end (to 30G). I thought objects are cleaned soon after they are out of scope.

They can be, but the language makes no guarantee that an object will be freed immediately when it becomes unreachable, since it is up to the discretion of the garbage collector.

But rather than worrying about the details of the GC, it seems like you already have a good solution (pre-allocating the RNGs) but you haven’t tried it out yet. It’s not necessary to re-run all 3 hours of computation just to verify that this well help. For example:

# Re-define gensum! to take its rng as an argument
julia> function gensum!(rng, seed)
           Random.seed!(rng, seed)
           d6 = Random.Sampler(rng, 1:6)
           ndx!(rng,d6,21)
       end

# Benchmark constructing a new MersenneTwister and sampling from it
julia> @btime gensum!(MersenneTwister(0), 1)
  19.940 μs (11 allocations: 19.59 KiB)

# Benchmark sampling from a pre-allocated MersenneTwister:
julia> @btime gensum!(rng, 1) setup=(rng = MersenneTwister(0))
  9.696 μs (2 allocations: 112 bytes)

so your code should run ~twice as fast and consume 174 times less memory just by taking that suggestion.

1 Like

I will definitely try the solution. What I don’t understand is that how come I can reseed a const RNG? Is it supposed to be constant reference like a <type> * const would be in C++?

const in Julia does only one thing: it tells the language that a given variable will be bound to that same object for the lifetime of your program. It doesn’t “freeze” the object that variable is bound to or affect it in any way.

For example:

julia> const x = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> push!(x, 4) # this is fine. We're not changing what `x` is bound to.
4-element Array{Int64,1}:
 1
 2
 3
 4

julia> x = [1,2] # this is not fine. We're actually binding the variable `x` to a new value. 
WARNING: redefining constant x
2 Likes

Only looping up to 0x000FFFFF I get for the original version (with 4 cores):
5.473501 seconds (9.37 M allocations: 18.702 GiB, 8.92% gc time)

Using https://gist.github.com/KristofferC/c46af9ebbbfbf2a3bd5bd0b905c28b34 gives a modest improvement
4.273078 seconds (2.08 M allocations: 110.989 MiB, 0.14% gc time)

The problem here is that the seeding of the RNG is bottlenecking (and is slower than needed, see https://github.com/JuliaLang/julia/issues/28911)

Applying the patch to fix this problem:

diff --git a/stdlib/Random/src/RNGs.jl b/stdlib/Random/src/RNGs.jl
index 0c103305ed..2c4e886fb9 100644
--- a/stdlib/Random/src/RNGs.jl
+++ b/stdlib/Random/src/RNGs.jl
@@ -276,9 +276,13 @@ end

 #### seed!()

-function seed!(r::MersenneTwister, seed::Vector{UInt32})
+function seed!(r::MersenneTwister, seed::Union{UInt32, Vector{UInt32}})
     copyto!(resize!(r.seed, length(seed)), seed)
-    dsfmt_init_by_array(r.state, r.seed)
+    if seed isa UInt32
+        dsfmt_init_gen_rand(r.state, seed)
+    else
+        dsfmt_init_by_array(r.state, r.seed)
+    end
     mt_setempty!(r)
     mt_setempty!(r, UInt128)
     fillcache_zeros!(r)
@@ -286,7 +290,13 @@ function seed!(r::MersenneTwister, seed::Vector{UInt32})
 end

 seed!(r::MersenneTwister=GLOBAL_RNG) = seed!(r, make_seed())
-seed!(r::MersenneTwister, n::Integer) = seed!(r, make_seed(n))
+function seed!(r::MersenneTwister, n::Integer)
+    if n <= typemax(UInt32)
+        seed!(r, UInt32(n))
+    else
+        seed!(r, make_seed(n))
+    end
+end
 seed!(seed::Union{Integer,Vector{UInt32}}) = seed!(GLOBAL_RNG, seed)

the original code becomes a bit better:

2.137527 seconds (7.11 M allocations: 17.896 GiB, 22.40% gc time)

but the patched code benefits even more from this:

julia> @time look()
0.998460 seconds (10 allocations: 272 bytes)
3 Likes