Generated (staged) function to avoid initializing an array multiple times for Monte Carlo simulations

vmh · May 15, 2017, 7:14pm

Hi everyone,

I have a function that returns an array (vector) of samples from a given distribution. E.g., consider the function below:

function draw_samples(dist)
    samples = Array{Float64, 1}(1000)
    rand!(dist, samples)
    x
end

And I’m calling this function a million times inside a loop for independent Monte Carlo (MC) trials. Really would like to avoid creating a new array (samples) in every iteration of my MC loop. Of course, I can always preallocate samples in my main program and pass it as an argument to draw_samples! to mutate the array (please note the ! at the end). However, I don’t like that style because things get really entangled and that approach is not quite modular. So, I’m considering a generated function, where in its body, I create samples and then the return statement is simply a call to the mutating _draw_samples! function, with the value of samples passed to it. E.g.,

function _draw_samples!(samples, dist)
    rand!(dist, samples)
    samples
end

@generated function draw_samples_1_allocation(dist)
    samples = Array{Float64, 1}(1000)
    quote
        _draw_samples!($(samples), dist)
    end
end

I’d really appreciate your advice on whether this approach makes any sense, what could go wrong with this method, or if there are any alternatives that are recommended over this one. Thank you for your help.

Keno · May 15, 2017, 7:33pm

Generated functions are the wrong approach to this in general. Try this:

let samples = Array{Float64, 1000}, inited = false
global draw_samples
function draw_samples(dist)
        inited || rand!(dist, samples)
        inited = true
        samples
end
end

Obviously you can adjust this examples as you see fit (e.g. adjust the cache to depend on dist), etc.

yuyichao · May 15, 2017, 7:35pm

There’s nothing non-modular about it. You can just have a context object and each “module” will be interested in a small part of the context.

It’s not reentrant or thread safe.

kristoffer.carlsson · May 15, 2017, 7:43pm

I don’t see the point with inited That will cache the results from the first call which is not what the original code seems to do?

ChrisRackauckas · May 15, 2017, 8:07pm

Instead of a function use a call overloaded type with the array as a field?

# Define the type
immutable MyFunction{T}
  samples::T
end

# Define a call
function (p::MyFunction)(dist)
  _draw_samples!(p.samples, dist)
end

# Build the type with the cache
draw_samples_1_allocation = MyFunction(Array{Float64, 1}(1000))

# Now use it like a function
dist = ...
draw_samples_1_allocation(dist)

vmh · May 15, 2017, 8:47pm

Thank you for your response. Would you recommend the preallocation approach even when there are say 20 different functions, each of which needs a different preallocated array (and all of them are being called in one single function)? Currently, I’ve written my code using preallocation in the single function and then mutating the arrays in individual functions but preallocating 20 different arrays in the single function (main program) just didn’t look right to me and I was wondering if I was doing something wrong.

yuyichao · May 15, 2017, 8:55pm

Yes.

gasagna · May 15, 2017, 9:10pm

What are the implications of this being non reentrant?

yuyichao · May 15, 2017, 9:26pm

As the code gets more complicated you can easily get’s into trouble if it ever recurse back.

TsurHerman · May 15, 2017, 9:52pm

you could maybe benefit from the functor approach.

function draw_samples!(samples,dist)
    rand!(dist, samples)
    samples
end

function fsamples_prealoc(g,n,T = Float64)
    const samples = Array{T, 1}(n)
    (args...) -> g(samples,args...)
end

draw_samples = fsamples_prealoc(draw_samples!,1000)

since you have 20 different functions with different requirements for cache size

vmh · May 16, 2017, 12:56am

Thank you for your response. I’d checked the closure method before but unfortunately, it is considerably slower than the direct mutating approach.

ChrisRackauckas · May 16, 2017, 1:15am

You mean @TsurHerman’s? Tha because the const isn’t actually doing anything:

github.com/JuliaLang/julia

implement local const

opened 09:48PM - 14 Dec 13 UTC

bkamins

lowering feature

I am wondering why code: ``` function f() const x = "aaa" x = 4… return x end println(f()) ``` executes and prints 4 without any warning or error, while ``` const x = "aaa" x = 4 println(x) ``` raises: `ERROR: invalid redefinition of constant x` Similarly: ``` function f() const x = "aaa" x = "bbb" return x end println(f()) ``` prints bbb without warning, while: ``` const x = "aaa" x = "bbb" println(x) ``` prints bbb, but with warning: `Warning: redefining constant x` I think behavior of constants should be consistent in global and function scopes. My Julia version: ``` Julia Version 0.2.0 Commit 05c6461 (2013-11-16 23:44 UTC) Platform Info: System: Windows (x86_64-w64-mingw32) WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm ```

But a call-overloaded immutable type should be fine.

TsurHerman · May 16, 2017, 7:15am

you can write the closure using Ref’s and then it will be type stable and the compiler will optimise
it correctly

function fsamples_prealoc(g,n,T = Float64)
    const samples = Array{T, 1}(n)
    (args...) -> g(samples,args...)
end

becomes

function fsamples_prealoc(g,n,T = Float64)
    samples = Ref{Array{T, 1}}( Array{T, 1}(n) )
    (args...) -> g(samples.x,args...)
end

vmh · May 18, 2017, 4:23pm

@TsurHerman, thank you so much for your response. This is a great and neat suggestion. As someone who knows very little about thread-safe programming, I wonder if your suggestions is thread-safe. Any thoughts on this would be appreciated.

vmh · May 18, 2017, 4:26pm

Hi @ChrisRackauckas, thank you so much for your response. Frankly, I didn’t know enough about parametric types that first I had to read about them to make sure I’d understand your solution. This is a pretty neat solution. Is your suggestion thread-safe? Any thoughts on this would be appreciated. @yuyichao, any thoughts on this approach (out of curiosity)?

ChrisRackauckas:

Instead of a function use a call overloaded type with the array as a field?

Define the type
immutable MyFunction{T}
  samples::T
end
Define a call
function (p::MyFunction)(dist)
  _draw_samples!(p.samples, dist)
end
Build the type with the cache
draw_samples_1_allocation = MyFunction(Array{Float64, 1}(1000))

Now use it like a function
dist = ...
draw_samples_1_allocation(dist)

ChrisRackauckas · May 18, 2017, 4:37pm

Depends on how you’re using it. Are you going to be calling the same instance of MyFunction on each thread? Then no because they will be sharing the same cache. But in that case you just make one of these per thread, put them all in an array, and access them via threadid(). That will work because each function is locally keeping its own cache. Example:

cached_functions = [MyFunction(Array{Float64, 1}(1000)) for i in Threads.nthreads()]
Threads.@threads for i in 1:1000
  cached_functions[Threads.threadid()](dist)
end

Or just make sure you split access to samples by the threadid in another way.

vmh · May 18, 2017, 4:46pm

Thank you, Chris, for your prompt response. Your example is very helpful (I’m learning a lot here).

TsurHerman · May 18, 2017, 10:29pm

Not thread safe as there is only one samples array and all threads access it , so you might ask for certain distribution in one thread and by the time you use it another thread called the function with a different distribution.

to make it thread-safe i assume you would want to change the generating function.
or better , make another generating function that handles thread safety through replication for example.

function fthread_safe(g)
    fv =  [deepcopy(g) for i=1:Threads.nthreads()]
    G = Ref{typeof(fv)}(fv)
    (args...) -> G.x[Threads.threadid()](args...)
end

and since there is only one parameter you can write it very elegantly at the end of another generating function using the |> operator (I dont know its name in english).

draw_samples = fsamples_prealoc(draw_samples!,1000) |> fthread_safe

musm · May 19, 2017, 4:28am

[quote=“TsurHerman, post:18, topic:3726”]
function fsamples_prealoc(g,n,T = Float64)
samples = Ref{Array{T, 1}}( Array{T, 1}(n) )
(args…) → g(samples.x,args…)
end[/quote]

function foo2(d)
    x = draw_samples2(d)
    2x
end

Looks like foo2 is not type-stable ?

TsurHerman · May 19, 2017, 12:17pm

I didn’t understand what you meant … what is this foo2?

I am no expert on type system and type stability and maybe I use the wrong term here, but from experience closures created using generating functions using static variables in the form of Ref’s as in my example , compile to optimal machine code.

Topic		Replies	Views
Using generated functions to preallocate memory? Internals & Design	16	1975	April 14, 2019
Understanding the limitations of generated functions General Usage question , performance , metaprogramming	17	1502	July 27, 2020
Generic n-dimensional loop without allocations General Usage question	15	1018	December 2, 2016
Function returning a constant vector (avoid recalculation and reallocation) General Usage question	5	911	January 3, 2017
Simulate Nested Array Without Allocating New Memory Every Loop? Performance question	13	1257	September 28, 2019

Generated (staged) function to avoid initializing an array multiple times for Monte Carlo simulations

Related topics