Pass return of recursive function up the call stack?

Hello there, this is sort of a follow up to the question I posted yesterday about improving memory allocation in a simulation. You can find that question here for any context.

Following the advice @DNF, I decided to implement pre-allocated arrays and return a copy of the array with the updated simulation.

Now I am adding layers of complexity. I’m really working on generating training data for a Temporal Difference algorithm using random walks, but this example shows generating a random number until we reach an even number and returning that sequence.

Now I would like to generate 2 episodes 5 times such that my training data becomes a nested array typed as Array{Array{Array{Int64}}}(undef, 5).

The size of 2nd array should be 2, and the inner most array is technically “unknown” in size. But I wish to limit it to 10 or less.

Here is my code example:

function gen_array!(cache)
    for i in 2:length(cache)
        num = rand(eltype(cache))
        cache[i] = num
        iseven(num) && return cache[1:i]
    end
    gen_array!(cache)
end

function episode!(episode, sim)

    for i in 1:length(episode)
        episode[i] = gen_array!(sim)
    end
    return episode
end


function simulate!(train, E)
    START = Array{Int64}(undef, 10)
    START[1] = 1333

    EPISODES = Array{Array{Int64}}(undef, E)

    for i in 1:length(train)
        train[i] = episode!(EPISODES, START)
    end
end

function main()
    N::Int64 = 5
    E::Int64 = 2

    train = Array{Array{Array{Int64}}}(undef, N)
    simulate!(train, E)
    train
end

The general idea is that we preallocate train and initialize simulation. We then start to fill our episodes through gen_array!().

You’ll notice this is a recursive function.

My reasoning here is, say we have hit the size limit of the array and still have not generated an even number. I don’t want to grow the size of the vector, but restart the function and write over the array until we reach a sequence that generates a random number.

I thought this was working until I closely inspected the output

include("example.jl")
main (generic function with 1 method)

julia> main()
5-element Array{Array{Array{Int64,N} where N,N} where N,1}:
 [[1333, 5175378229263772070], [1333, -7110769136095499899, 2890408208590898501, -2599245076049037924]]
 [[1333, 5175378229263772070], [1333, -7110769136095499899, 2890408208590898501, -2599245076049037924]]
 [[1333, 5175378229263772070], [1333, -7110769136095499899, 2890408208590898501, -2599245076049037924]]
 [[1333, 5175378229263772070], [1333, -7110769136095499899, 2890408208590898501, -2599245076049037924]]
 [[1333, 5175378229263772070], [1333, -7110769136095499899, 2890408208590898501, -2599245076049037924]]

Each sequence is different, but every episode is the same. I think this has something to do with the function being recursive. That it gets so deep in the stack that once it hits the return it passes it back to the previous function call and it disappears or something.

Any ideas?

1 Like

You are allocating one array START and then manipulating it over and over again. Each episode contains a pointer to the same memory! You either need to copy start before you do something or instead of working with array of arrays use a multi-dimensional array from the getgo.

1 Like

I was thinking about multidimensional array, but doesn’t reach row need to have the same amount of elements?

Could I do something like train = Array{Int64}(undef, at most 10, 2, 5)?

Also doesn’t cache[1:i] make a copy of start and return that? So it should be changing the values of start and returning a copy if it hits an even number, if it doesn’t it just rewrites over start until it does?

So, since you asked for performance. Julia Arrays are actually pretty heavy, and aren’t true julia objects; instead they are implemented in array.c and julia.h and then bolted on. This means that tiny arrays are no good: Almost all of the time and memory is spent on overhead (Array is designed for very large chunks of memory).

Presumably generation time doesn’t really matter, and most of your time will be spent on training. So it would probably pay to get creative with your data layout. Just to give an example,

struct RunContainer{T} <: AbstractMatrix{SubArray{T,1,Array{T,1},Tuple{UnitRange{Int64}},true}}
contents::Vector{T}
colptr::Vector{Int}
end

RunContainer{T}() where T = RunContainer(T[], [1])

Base.size(rc::RunContainer) = (2, (length(rc.colptr)-1)>>1)

Base.@propagate_inbounds function Base.getindex(rc::RunContainer, i, j)
@boundscheck checkbounds(rc, i, j)
from = rc.colptr[2*(j-1) + i]
to = rc.colptr[2*(j-1) + i + 1]
return view(rc.contents, from:(to-1))
end

function append_rc!(rc::RunContainer, run1, run2)
append!(rc.contents, run1)
push!(rc.colptr, length(rc.contents)+1)
append!(rc.contents, run2)
push!(rc.colptr, length(rc.contents)+1)
return rc
end

This would be used like

julia> rc=RunContainer{Float64}(); append_rc!(rc, [1,2,3], [4,5,6, 7]); append_rc!(rc, [4.5], [12, 1.0]); append_rc!(rc, [1], [2]); rc
2×3 RunContainer{Float64}:
 [1.0, 2.0, 3.0]       [4.5]        [1.0]
 [4.0, 5.0, 6.0, 7.0]  [12.0, 1.0]  [2.0]

This kind of layout is pretty good if the views don’t get allocated, i.e. if your getindex is close to the actual use, which is hopefully inlined into the same context.

1 Like

Does this example utilize the way Julia prefers to index matrices? By column? I’m used to iterating row by row.

Would I just generate the first episode for every set and then generate the 2nd episode for every set?

The optimal iteration order is the same as the storage order, i.e. rc[1, j][k] then rc[1, j][k+1], …, rc[2, j][1], …, rc[1, j+1][1], …

If you only want to generate one run at a time, you can simply append(rc.contents, run); push!(rc.colptr, length(rc.contents)+1);, and every second run you append will open a new pair of runs. These pairs are e.g. accessible by view(rc, :, j) or by (rc[1, j], rc[2, j]). They are not accessible by rc[:, i] because I did not implement getindex(rc::RunContainer, ::Colon, j).

Nobody prevents you from implementing e.g.

Base.@propagate_inbounds function Base.getindex(rc::RunContainer, k, i, j)
@boundscheck checkbounds(rc, i, j)
from = rc.colptr[2*(j-1) + i]
to = rc.colptr[2*(j-1) + i + 1]
@boundscheck checkbounds(from:to, k)
return rc.contents[from + k - 1]
end

such that the optimal memory order looks more julian (rc[k, i, j] is the same as rc[i, j][k]). This might confuse some readers of your code, though, because rc[k, i, j] is idiomatically used on <:AbstractArray{T, 3} where T instead of <:AbstractMatrix.

It is your choice what kind of API you want to use, as long as you provide whatever your later processing steps need.

1 Like