Trying to use map function in Threads

Hi guys,

I am trying to parallelize a computation using Threads instead of processes. I have tested my functions several times using Distributed package, and everything is working as expected with pmap().

I was trying to implement a threaded version of the map() function because sharing my internal struct and Float128 operations among procs required a considerable amount of RAM to run. So I was trying to use shared memory and multiple threads to perform the estimations. I need to execute the code in parallel because I am performing a custom Approximate Bayesian Computation.

I was trying to use ThreadsX.map or ThreadPools.qmap() functions since my simulation function can input a mutable struct and internally change the values using prior values from an input vector, but I found that the output matrix contains different solutions for the same model.

Here is the code I am using and the associated outputs

model, pol_s, pol_n, fix_rates = unzip(ThreadsX.map(x -> simulate_models(x,param),priors));

 0.1    0.375003  0.402674  -690.864  5.68428  1829.86  0.345655  -996.034
 0.125  0.375003  0.402674  -690.864  5.68428  1829.86  0.345655  -996.034
 0.15   0.375003  0.402674  -690.864  5.68428  1829.86  0.345655  -996.034
model, pol_s, pol_n, fix_rates = unzip(ThreadsX.map(x -> simulate_models(x,param),priors));

 0.1    0.227367  0.357421  -959.77  8.66814  1884.51  0.643587  -730.472
 0.125  0.227367  0.357421  -959.77  8.66814  1884.51  0.643587  -730.472
 0.15   0.227367  0.357421  -959.77  8.66814  1884.51  0.643587  -730.472

Both outputs should be similar since the input was the same and ThreadsX.map should maintain the output order too.

Do you know how to execute a threaded version of the map() function safely?

Best, Murga.

Without information on how simulate_models is implemented, it is seems very difficult to guess what is going on.

I assume there are random numbers involved. How/where do you set the random seed? (Given that you probably know more about statistics than I do, sorry for asking. It’s just the most obvious starting point :wink: )

Hi!

sorry for the short info. I hope that was something easy related to a naive innapropiate use of the thread functions. My fault! simulate_models() function takes a Vector{Float64} as input to change a mutable struct. The input vector is already sorted to modify each field at the struct appropriately. Once the fields are changed, I perform some estimations to estimate some biological properties to output the diversity level intra and inter-population. The function solve_model() performs such measures, but it executes a bunch of functions internally.

function simulate_models(param::parameters,var_params::Vector{Float64})

	(B, gH, gL, gam_flanking, gam_dfe, shape, alpha, alpha_low) = var_params


    param.B            = param.B_bins[searchsortedfirst(param.B_bins, B)]
    param.shape        = shape
    param.gam_dfe      = gam_dfe
    param.scale        = abs(shape / gam_dfe)
    param.gH           = gH
    param.gL           = gL
    param.gam_flanking = gam_flanking
    param.gam_dfe      = gam_dfe
    param.al_tot       = alpha
    param.al_low       = alpha * alpha_low

	m,r_ps,r_pn,r_f = solve_model(param)

	return(m,r_ps,r_pn,r_f)
end

My point is that the variable priors is a container of Vector{Float64} already containing the random values to perform such estimations. So if ThreadsX.map() (or ThreadPools.qmap()) will call the function f in parallel over the collection iterators (in my case Vector{Vector{Float64}}) the outputs would be similar to pmap() or map().

@time m,r_ps,r_pn,r_f = unzip(ThreadsX.map(x -> simulate_models(param,x),priors));

Could the problem be associated with the mutable struct? I thought that once the variable is called within the function, it would be a local variable, and each operation would be independent.

Thanks in advance!

If param is a mutable struct (which is seems to be), then it will be shared between all threads, since it is “closed” within the anonymous function definition (x->simulate...). If you are using the same mutable struct between threads, you are likely seeing a race condition.

To make sure each is local, you can use the function deepcopy to create a local copy to use. This is probably the easiest way, but would involve additional copies. Try this:

model, pol_s, pol_n, fix_rates = unzip(ThreadsX.map(x -> simulate_models(deepcopy(param), x),priors));

EDIT: Fixed the order of arguments in simulate_models

If this copying is too much, you can try and make only N copies, where N is the number of threads. This is obviously a more complicated route, but I am happy to explain if this is necessary.

1 Like

As an aside, since the param variable is mutated, the naming convention for this is usually simulate_models!, with the “bang” (!) showing that the function mutates the first parameter.

Hi @jmair!

Thank you so much for the reply. deepcopy is working fine, so you were right the mutable struct was shared between threads creating a race condition.

Because the mutable struct contains a big Dictionary of sparse array values could be a problem to create a copy. Nonetheless, such a dictionary isn’t changing across estimations (I am performing ~10^6 estimates). So to avoid memory problems, I just extract the dictionary from the struct and input it in the anonymous function (of course, I needed to change a little bit of downstream code).

Pasting the new code:

typeof(convoluted_dictionary)
Dict{Float64, SparseMatrixCSC{Float64, Int64}}
# 37 keys,  each value is a SparseMatrix of size 1321Ă—1999

@time m,r_ps,r_pn,r_f = unzip(ThreadsX.map( x -> simulate_models(deepcopy(param),convoluted_dictionary,x),priors));

Anyway. I will be testing the following days. Thanks again, I am pretty naive on threading computations!

Besides, simulate_models! not only mutates the input variable but it outputs multiple arrays. Do you know if the name convention should account for the bang anyway?

Thanks again, the Julia community is amazing!

Best, Murga.

I am not sure if anyone else has an opinion on this, but I would say having the bang when it mutates is better, regardless of return type, so it is clear when the code is read.

And you’re welcome, good luck!

1 Like