Help with parallelizing indirect inference estimation

I played around with this a bit. If I set M=1000, then I can get about a 30-40% speedup using 2 threads versus one. Using more threads does not help. My belief is that garbage collection interferes with performance using threads, as, from what I have read, any garbage collection will stop all threads. So, with too many threads, the probability of all of them getting stopped goes up. I have found in some experimentation that MPI gives good performance, see An embarrassingly parallel problem: threads or MPI?. To limit garbage collection when using threads, trying to avoid allocations is a good strategy, I believe.

The code I used for threading for this problem is

# indirect inference objective function
function iiobj(β, θ, x, u0, M)
    ys = simulate(x, β, u0, M)
    m = zeros(size(x,2))
    Threads.@threads for i in 1:M
    #for i in 1:M
    m .+= ols(x,ys[i])
    end
    m = m ./ M
    return sum((θ - m).^2) 
end

For optimization, I set the criterion using an anonymous function:

obj = β -> iiobj(β, θ, x, u0, M)  
1 Like