I have a function that returns an array (vector) of samples from a given distribution. E.g., consider the function below:
function draw_samples(dist)
samples = Array{Float64, 1}(1000)
rand!(dist, samples)
x
end
And I’m calling this function a million times inside a loop for independent Monte Carlo (MC) trials. Really would like to avoid creating a new array (samples) in every iteration of my MC loop. Of course, I can always preallocate samples in my main program and pass it as an argument to draw_samples! to mutate the array (please note the ! at the end). However, I don’t like that style because things get really entangled and that approach is not quite modular. So, I’m considering a generated function, where in its body, I create samples and then the return statement is simply a call to the mutating _draw_samples! function, with the value of samples passed to it. E.g.,
function _draw_samples!(samples, dist)
rand!(dist, samples)
samples
end
@generated function draw_samples_1_allocation(dist)
samples = Array{Float64, 1}(1000)
quote
_draw_samples!($(samples), dist)
end
end
I’d really appreciate your advice on whether this approach makes any sense, what could go wrong with this method, or if there are any alternatives that are recommended over this one. Thank you for your help.
Generated functions are the wrong approach to this in general. Try this:
let samples = Array{Float64, 1000}, inited = false
global draw_samples
function draw_samples(dist)
inited || rand!(dist, samples)
inited = true
samples
end
end
Obviously you can adjust this examples as you see fit (e.g. adjust the cache to depend on dist), etc.
Instead of a function use a call overloaded type with the array as a field?
# Define the type
immutable MyFunction{T}
samples::T
end
# Define a call
function (p::MyFunction)(dist)
_draw_samples!(p.samples, dist)
end
# Build the type with the cache
draw_samples_1_allocation = MyFunction(Array{Float64, 1}(1000))
# Now use it like a function
dist = ...
draw_samples_1_allocation(dist)
Thank you for your response. Would you recommend the preallocation approach even when there are say 20 different functions, each of which needs a different preallocated array (and all of them are being called in one single function)? Currently, I’ve written my code using preallocation in the single function and then mutating the arrays in individual functions but preallocating 20 different arrays in the single function (main program) just didn’t look right to me and I was wondering if I was doing something wrong.
@TsurHerman, thank you so much for your response. This is a great and neat suggestion. As someone who knows very little about thread-safe programming, I wonder if your suggestions is thread-safe. Any thoughts on this would be appreciated.
Hi @ChrisRackauckas, thank you so much for your response. Frankly, I didn’t know enough about parametric types that first I had to read about them to make sure I’d understand your solution. This is a pretty neat solution. Is your suggestion thread-safe? Any thoughts on this would be appreciated. @yuyichao, any thoughts on this approach (out of curiosity)?
Depends on how you’re using it. Are you going to be calling the same instance of MyFunction on each thread? Then no because they will be sharing the same cache. But in that case you just make one of these per thread, put them all in an array, and access them via threadid(). That will work because each function is locally keeping its own cache. Example:
cached_functions = [MyFunction(Array{Float64, 1}(1000)) for i in Threads.nthreads()]
Threads.@threads for i in 1:1000
cached_functions[Threads.threadid()](dist)
end
Or just make sure you split access to samples by the threadid in another way.
Not thread safe as there is only one samples array and all threads access it , so you might ask for certain distribution in one thread and by the time you use it another thread called the function with a different distribution.
to make it thread-safe i assume you would want to change the generating function.
or better , make another generating function that handles thread safety through replication for example.
function fthread_safe(g)
fv = [deepcopy(g) for i=1:Threads.nthreads()]
G = Ref{typeof(fv)}(fv)
(args...) -> G.x[Threads.threadid()](args...)
end
and since there is only one parameter you can write it very elegantly at the end of another generating function using the |> operator (I dont know its name in english).
I didn’t understand what you meant … what is this foo2?
I am no expert on type system and type stability and maybe I use the wrong term here, but from experience closures created using generating functions using static variables in the form of Ref’s as in my example , compile to optimal machine code.