Simulate Nested Array Without Allocating New Memory Every Loop?

dylanjm · September 27, 2019, 3:04am

To start off, here is some necessary information

julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Reproducible Example: Start off each simulation on the same number and generate random integers until reaching an even number. Store that run into an array and repeat 5 times.

Goal: Accomplish this correctly (correctly being defined in the problem below) while minimizing memory allocation.

Problem:

a) Minimize memory allocation by pre-allocating array, but simulation is ruined as the array is “passed by reference” (not quite sure if that’s technically correct) such that once the first run reaches an even number, all subsequent runs end with an even number and immediately break from the simulation or
b) Correctly simulate all runs but destroy run-time by allocating a new starting array for each simulation.

Note: This is a reprex of a much larger problem I am working on now. So while the time & allocation difference in this example is minimal, it is having a stronger effect in my code.

To start off, I’ll show you the code from problem part (a):

example01.jl

function gen_array(seq::Array{Int64,1})
    while true
        seq[end] % 2 == 0 && break

        num = rand(Int64)
        push!(seq, num)
    end
    return seq
end

function simulate_chains(n::Int64, seq::Array{Int64,1})
    map(i -> gen_array(seq), 1:n)
end

function main()
    N::Int64 = 5

    START = Array{Int64,1}(undef, 1)
    fill!(START, 1233)

    simulation = simulate_chains(N, START)
end

Here is me loading the script into the REPL and benchmarking it twice to allow it to compile:

julia> include("example01.jl")
main (generic function with 1 method)

julia> @time main();
  0.048245 seconds (220.16 k allocations: 11.701 MiB)

julia> @time main()
  0.000004 seconds (9 allocations: 480 bytes)
5-element Array{Array{Int64,1},1}:
 [1233, -4418181421764378021, -6736001003188972884]
 [1233, -4418181421764378021, -6736001003188972884]
 [1233, -4418181421764378021, -6736001003188972884]
 [1233, -4418181421764378021, -6736001003188972884]
 [1233, -4418181421764378021, -6736001003188972884]

Notice how the output is the same for each simulation. While this may be possible, it is certainly not probable. As mentioned in problem part (a), the START array is remaining “persistent” across simulations. This is not desired.

Now for the code from problem part (b)

example02.jl

function gen_array(seq::Array{Int64,1}=[1233])
    while true
        seq[end] % 2 == 0 && break

        num = rand(Int64)
        push!(seq, num)
    end
    return seq
end

function simulate_chains(n::Int64)
    map(i -> gen_array(), 1:n)
end

function main()
    N::Int64 = 5

    simulation = simulate_chains(N)
end

Notice how I’ve changed the code by giving gen_array() a default value and removing my pre-allocated array from main(). Here I am in a new REPL session running the code:

julia> include("example02.jl")
main (generic function with 1 method)

julia> @time main();
  0.061380 seconds (256.65 k allocations: 13.881 MiB)

julia> @time main()
  0.000004 seconds (15 allocations: 1008 bytes)
5-element Array{Array{Int64,1},1}:
 [1233, 1117329081981021236]
 [1233, -6637213868465549306]
 [1233, 4235110922391529754]
 [1233, 6888918319811551361, 8043136562908430093, 3767038082823508588]
 [1233, 6019512490854919842]

As you can see I’ve allocated more memory in this code (although the time hasn’t changed so much). My real code has significant time difference by only changing this aspect.

Is there a way to have my cake and eat it too?

affans · September 27, 2019, 3:31am

You are constructing one array in your main and passing that by reference to the gen_array function. You should construct a new array in your gen_array function and pass in a starting value. See the following:

function gen_array(START_VALUE)
    seq = Array{Int64, 1}()
    push!(seq, START_VALUE)
    while true
        seq[end] % 2 == 0 && break

        num = rand(Int64)
        push!(seq, num)
    end
    return seq
end

function simulate_chains(n::Int64, START_VALUE)
    map(_ -> gen_array(START_VALUE), 1:n)
end

function main()
    N::Int64 = 5

    #START = Array{Int64,1}(undef, 1)
    #fill!(START, 1233)

    simulation = simulate_chains(N, 1233)
end

edit: oh I see. I am not sure how you would get rid of the allocations since you don’t know the size of the array to begin with.

DNF · September 27, 2019, 7:43am

In your first code example, simulate_chains just gets the same vector, seq, in each iteration. The first time, it fills it up until the last value is even (you can just write while isodd(seq[end]) btw). All the other times, it just gets the same vector over and over, checks that the last value is even and bails out immediately.

In the second code example, you give it a new starting vector for each iteration, so that issue disappears.

Sorry, I don’t have time to look at the performance at the moment.

dylanjm · September 27, 2019, 7:54am

Yes, that much is clear to me. My question is more along the lines of how can I keep the speed/mem allocation of my first example while getting the right answer of my second example.

DNF · September 27, 2019, 8:12am

I think I’d pass in some cache vector into gen_array! (use a ! in the name when mutating arguments). The cache vector could have some reasonable length to make it unlikely to be to short (e.g. if it has length 20, the risk of overflowing is approximately 1 to a million). Then fill that up and finally return a slice.

If you get too many elements, start pushing or use resize!.

Something along these lines (warning: untested, and hastily written):
Edit: the code did have (at least) one bug. Fixed now.

function gen_array!(cache)
    for i in 2:length(cache)
        num = rand(eltype(cache))
        cache[i] = num
        iseven(num) && return cache[1:i]  # slices makes copies
    end
    num = 1  # initialize
    while isodd(num)
        num = rand(eltype(cache))
        push!(cache, num)
    end
    return copy(cache)  # remember to copy
end

DNF · September 27, 2019, 8:29am

BTW, you should definitely use the BenchmarkTools package to get better timing and memory estimates. Especially for fast functions like the ones you have posted, the timings from @time are almost meaningless, because its precision is too poor.

DNF · September 27, 2019, 11:41am

No, that’s the point. You cannot keep that speed, because it’s misleading. It’s only that fast because seq[end] % 2 == 0 is always true immediately (except for the very first time), so then it just bails out.

dylanjm · September 27, 2019, 5:05pm

Ahh I see what you’re saying. I can’t believe that didn’t dawn on me. I am currently testing your code above.

Is there any special about a “cache vector” or is that just a normal pre-allocated array like I made in my first example?

DNF · September 27, 2019, 5:15pm

Nothing special, but it gets reused across calls to gen_array! it’s just an ordinary vector.

dylanjm · September 27, 2019, 5:27pm

Perfect, You’re code works very well. I like eltype, I hadn’t thought to use that before. Do we return a copy so as to not really update the cache vector on the heap?

Or are we are constantly rewriting over the array after updating it every time. I think looking at the code longer, this is what we are doing.

DNF · September 27, 2019, 7:38pm

We are updating the cache vector, but we need a copy (or slice) in order to make sure that each return vector is independent.

dylanjm · September 28, 2019, 7:18am

Apologies for coming back to this later as I’ve been thinking about this code all day. Instead of resizing the array if it gets to big, could we potentially use recursion to repeat until it hits an even value?

Something along these lines:

function gen_array!(cache)
    for i in 2:length(cache)
        num = rand(eltype(cache))
        cache[i] = num
        iseven(num) && return cache[1:i]  # slices makes copies
    end
    gen_array(cache)
end

My reasoning being, if we hit the size limit of the vector the loop will have stopped without returning, so we’ll go again and since we are only returning cache[1:i] we don’t really have to worry about what is leftover in the array ever.

BeastyBlacksmith · September 28, 2019, 9:08am

depends on what output you desire, this way you don’t get the whole sequence of random numbers but the sequence modulo the cache vector length. Do you need the sequence at all?

foobar_lv2 · September 28, 2019, 10:50am

@DNF’s solution is the preferable one, for reasons that are only apparent if you know the implementation in array.c.

push! needs to sometimes resize a vector. In addition to their length, vectors have a capacity: The amount of extra memory reserved. When you push! and the capacity is reached, then the vector is grown by a large amount (typically 2x, details in array.c), the old contents are copied over, and the old buffer is freed (no garbage collection needed). In the cheap case (capacity left over), push! is still surprisingly expensive, due to compiler/linker limitations (foreign function call that cannot be inlined, 10 cycles down the drain).

From this, you learn:

push! is slower than setindex!, even if nothing gets allocated (enough capacity), and
push! into a vector with enough capacity is medium-cheap, so make sure to have capacity
if you ever push! into a vector, then it will waste up to 2x the memory in spare capacity, in order to make subsequent push! fast.

Watch how @DNF takes all of these facts into account: His code has only a single cache object (nit: name it buffer) that gets dynamically resized, and the return is always a copy/slice, i.e. uses tight allocation (no spare capacity that would likely be wasted, because you don’t plan to push! into your runs after construction). Further, cache retains its size between subsequent runs, avoiding push! calls for faster setindex! (the number of push! calls is the length of longest run, and does not scale with the number of runs).

Topic		Replies	Views
Accumulating memory in nested loops General Usage performance	10	2185	April 17, 2017
Is there a way to avoid allocating memory when using Chain.jl or pipeing? Technica General Usage	1	274	March 16, 2021
Pass return of recursive function up the call stack? New to Julia	6	1228	September 28, 2019
How to reduce memory allocation when looping over an array? Performance question , memory-allocation	2	1986	August 16, 2020
Unexpectedly high memory allocation when running n-body simulation Performance memory-allocation	3	959	October 28, 2018

Simulate Nested Array Without Allocating New Memory Every Loop?

Related topics