Generated (staged) function to avoid initializing an array multiple times for Monte Carlo simulations


#1

Hi everyone,

I have a function that returns an array (vector) of samples from a given distribution. E.g., consider the function below:

function draw_samples(dist)
    samples = Array{Float64, 1}(1000)
    rand!(dist, samples)
    x
end

And I’m calling this function a million times inside a loop for independent Monte Carlo (MC) trials. Really would like to avoid creating a new array (samples) in every iteration of my MC loop. Of course, I can always preallocate samples in my main program and pass it as an argument to draw_samples! to mutate the array (please note the ! at the end). However, I don’t like that style because things get really entangled and that approach is not quite modular. So, I’m considering a generated function, where in its body, I create samples and then the return statement is simply a call to the mutating _draw_samples! function, with the value of samples passed to it. E.g.,

function _draw_samples!(samples, dist)
    rand!(dist, samples)
    samples
end

@generated function draw_samples_1_allocation(dist)
    samples = Array{Float64, 1}(1000)
    quote
        _draw_samples!($(samples), dist)
    end
end

I’d really appreciate your advice on whether this approach makes any sense, what could go wrong with this method, or if there are any alternatives that are recommended over this one. Thank you for your help.


#2

Generated functions are the wrong approach to this in general. Try this:

let samples = Array{Float64, 1000}, inited = false
global draw_samples
function draw_samples(dist)
        inited || rand!(dist, samples)
        inited = true
        samples
end
end

Obviously you can adjust this examples as you see fit (e.g. adjust the cache to depend on dist), etc.


#3

There’s nothing non-modular about it. You can just have a context object and each “module” will be interested in a small part of the context.

It’s not reentrant or thread safe.


#4

I don’t see the point with inited That will cache the results from the first call which is not what the original code seems to do?


#5

Instead of a function use a call overloaded type with the array as a field?

# Define the type
immutable MyFunction{T}
  samples::T
end

# Define a call
function (p::MyFunction)(dist)
  _draw_samples!(p.samples, dist)
end

# Build the type with the cache
draw_samples_1_allocation = MyFunction(Array{Float64, 1}(1000))

# Now use it like a function
dist = ...
draw_samples_1_allocation(dist)

#6

Thank you for your response. Would you recommend the preallocation approach even when there are say 20 different functions, each of which needs a different preallocated array (and all of them are being called in one single function)? Currently, I’ve written my code using preallocation in the single function and then mutating the arrays in individual functions but preallocating 20 different arrays in the single function (main program) just didn’t look right to me and I was wondering if I was doing something wrong.


#7

Yes.


#8

What are the implications of this being non reentrant?


#9

As the code gets more complicated you can easily get’s into trouble if it ever recurse back.


#10

you could maybe benefit from the functor approach.

function draw_samples!(samples,dist)
    rand!(dist, samples)
    samples
end

function fsamples_prealoc(g,n,T = Float64)
    const samples = Array{T, 1}(n)
    (args...) -> g(samples,args...)
end

draw_samples = fsamples_prealoc(draw_samples!,1000)

since you have 20 different functions with different requirements for cache size


#11

Thank you for your response. I’d checked the closure method before but unfortunately, it is considerably slower than the direct mutating approach.


#12

You mean @TsurHerman’s? Tha because the const isn’t actually doing anything:

But a call-overloaded immutable type should be fine.


#13

you can write the closure using Ref’s and then it will be type stable and the compiler will optimise
it correctly

function fsamples_prealoc(g,n,T = Float64)
    const samples = Array{T, 1}(n)
    (args...) -> g(samples,args...)
end

becomes

function fsamples_prealoc(g,n,T = Float64)
    samples = Ref{Array{T, 1}}( Array{T, 1}(n) )
    (args...) -> g(samples.x,args...)
end

#14

@TsurHerman, thank you so much for your response. This is a great and neat suggestion. As someone who knows very little about thread-safe programming, I wonder if your suggestions is thread-safe. Any thoughts on this would be appreciated.


#15

Hi @ChrisRackauckas, thank you so much for your response. Frankly, I didn’t know enough about parametric types that first I had to read about them to make sure I’d understand your solution. This is a pretty neat solution. Is your suggestion thread-safe? Any thoughts on this would be appreciated. @yuyichao, any thoughts on this approach (out of curiosity)?


#16

Depends on how you’re using it. Are you going to be calling the same instance of MyFunction on each thread? Then no because they will be sharing the same cache. But in that case you just make one of these per thread, put them all in an array, and access them via threadid(). That will work because each function is locally keeping its own cache. Example:

cached_functions = [MyFunction(Array{Float64, 1}(1000)) for i in Threads.nthreads()]
Threads.@threads for i in 1:1000
  cached_functions[Threads.threadid()](dist)
end

Or just make sure you split access to samples by the threadid in another way.


#17

Thank you, Chris, for your prompt response. Your example is very helpful (I’m learning a lot here).


#18

Not thread safe as there is only one samples array and all threads access it , so you might ask for certain distribution in one thread and by the time you use it another thread called the function with a different distribution.

to make it thread-safe i assume you would want to change the generating function.
or better , make another generating function that handles thread safety through replication for example.

function fthread_safe(g)
    fv =  [deepcopy(g) for i=1:Threads.nthreads()]
    G = Ref{typeof(fv)}(fv)
    (args...) -> G.x[Threads.threadid()](args...)
end

and since there is only one parameter you can write it very elegantly at the end of another generating function using the |> operator (I dont know its name in english).

draw_samples = fsamples_prealoc(draw_samples!,1000) |> fthread_safe


#19

[quote=“TsurHerman, post:18, topic:3726”]
function fsamples_prealoc(g,n,T = Float64)
samples = Ref{Array{T, 1}}( Array{T, 1}(n) )
(args…) -> g(samples.x,args…)
end[/quote]

function foo2(d)
    x = draw_samples2(d)
    2x
end

Looks like foo2 is not type-stable ?


#20

I didn’t understand what you meant … what is this foo2?

I am no expert on type system and type stability and maybe I use the wrong term here, but from experience closures created using generating functions using static variables in the form of Ref’s as in my example , compile to optimal machine code.