Collecting results in proper order using Multi-threading

Hi, I am performing Montecarlo type method and I would like to perform the Multi-Treding.
Unfortunately, I am a beginner in Multi-threading for operations of the form n_{i}/n_{i-1}.
Usually, I perform multiple cores to perform diferent iterations of the same code and I do not
care for the order of the output. But, now I need to put the ordered outputs in an array to perform Prod(n_{i}/n_{i-1}). Here is a simplified version of my code:

using Random
using LinearAlgebra
using Distributions

rad = 1.0
rmax = 4
ra = range(0, rad, length=rmax)
sets_prod=Any[]

for s=1:2
    suc_iter, pos_iter, neg_iter, rat_set = Any[], Any[], Any[], Any[]
    ratious = Any[]

@sync begin
 for ri=1:rmax-1
Threads.@spawn begin 
    suc, pos, neg = 0, 0, 0
    for i=1:10^(5)
        d = 0
        d=rand(Uniform(-ra[ri],ra[ri]), 3)
        println("ri, i  = $ri, $i on thread $(Threads.threadid())")
       if sum(d) >= 0 
        suc += 1
        d[2]=-d[2]
         if  sum(d) >= 0 
                pos += 1
          else
                neg +=1
         end

    end
end ################### end iterations
    push!(suc_iter,suc)
    push!(pos_iter,pos)
    push!(neg_iter,neg)
   
    if (pos/suc) > 0
        push!(rat_set,(pos/suc))
    else
        push!(rat_set,(0.0))
    end
end #### end spawn  
end ####Synd

end ##### end radii
    println("rat_set:  ", rat_set)

    for k=2:rmax-1
        push!(ratious, rat_set[k]/rat_set[k-1])
    end
   push!(sets_prod,prod(ratious))
end ############# end sets
  println("sets", sets_prod)


The problem is that I want parallelize the loop for ri=1:rmax-1 and after that obtain 
the ordered ratios for ri=1,2,3 ..., this because I need the product at the end. I tried 
with  `spawn` and `@sync` and I am able to send every part of the loop to different 
threads but I am not able to collect the results in

    push!(suc_iter,suc)
    push!(pos_iter,pos)
    push!(neg_iter,neg)

in the correct order. I tried Atomic operations but I did not make it work.
Does anyone has a suggestion. Thanks in advance

If you do a known number of iterations and collect the same number of elements into each array, you may allocate the storage in advance and set suc_iter[ri] = suc etc. instead of push!, I guess.

Thank you for your answer. What I understood is that is better to create an array since the beginning
in the way:

suc_iter = fill(0.0, iters, iters)

and then fill it with each suc from every iteration for ri. The problem that I have using that method is that I usually have arrays constructed after 10^8 iterations and then 120 realizations of the code and my computer rapidly runs out of memory, I hope I have had it understand correctly.
Please correct me if I did not.
Thanks for the help.

Why are you making these vectors into Any vectors? That is bad for performance, and there doesn’t seem to be a reason for doing it.

But you have the same problem if you start with an empty vector and push! to it, as long as the vectors end up the same size. I don’t understand why you don’t just pre-allocate. In fact, I would expect pre-allocating with cause less memory use.

But if you are going to create several length 10^8 vectors (of eltype Any) times 120, and keep it all in memory at the same time, that will just not work. It has nothing to do with multithreading, you are just running out of memory.

Also, you absolutely should put your code inside functions. Working in global scope like this is terrible for performance and memory use.

Before you start using advanced features like multithreading, you should read the performance tips to get rid of basic performance mistakes: Performance Tips · The Julia Language

I suggest you do the following: Read the performance tips. Then create a minimal example (MWE), which is a function that returns the quantity or quantities you need to get.

Right now, your example isn’t minimal. It contains a lot of stuff that does nothing. For example, suc_iter, pos_iter, neg_iter are not used for anything. Can you just delete them from your MWE? If you write a function with an explicit return statement, we will know which operations and variables that can be deleted and optimized. Right now we can’t know which parts of the code are important and which are not.

2 Likes

No, why?

Just create

suc_iter = zeros(rmax)

and in loop

@sync for ri in 1:rmax-1
    Threads.@spawn begin
        ...
        for i in 1:10^5
            ...
        end
        suc_iter[ri] = suc
        ...
    end
end

Or something like that.
And in any case, I recommend you to follow @DNF’s advice on decomposing your logic into functions with clear inputs instead of writing the results into global variables. That will make it easier for the community as well to further help you.

Thank you all for the answers, they are really helpful. Also, I need to learn more about this language, and now I know why my program consumes a good amount of memory size. I will follow the advice of @DNF
and the code of @Vasily_Pisarev. I will post the new function later. Thanks for the help.