Dear Julia Users,
I am running multiple layers of for loops. Within the for loops, it is a function named do_one, which will return a 105000×34 DataFrame. I would want to combine these DataFrame to a larger DataFrame. Codes are as below
results_all = Array{Any}(undef, I, J, K)
for i = 1: I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end
My question: how to realize parallel computing with @distributed or pmap?
Thank you so much!
             
            
              
              
              
            
            
           
          
            
            
              what are you passing to do_one? Is it i, j, or j, k,? That is, are you getting K dataframes, or are you getting I dataframes?
Also, please enclose your code in triple backticks:
```
code goes here
```
             
            
              
              
              
            
            
           
          
            
            
              i, j, k are all passed to do_one through “parameters.jl”. Each i, j, k combination will produce a 105000×34 DataFrame, i.e., result in the code. But results_all should be I J K array, with each of its element being a a 105000×34 DataFrame.
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(…)
            results_all[i, j, k] = result
        end
    end
end
             
            
              
              
              
            
            
           
          
            
            
              I also should point out that different parameters will be passed to do_one in different i, j, k combinations.
parameters.jl contains information of parameter1, parameter2, parameter3.
So the codes are
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(parameter1[i], parameter2[j], parameter3[k])
            results_all[i, j, k] = result
        end
    end
end
             
            
              
              
              
            
            
           
          
            
            
              So, I would do something like this. Note that I think you shouldn’t use Any. You should use a real type.
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    for j = 1:J, k = 1:K
        result = do_one(...)
        results_all[i, j, k] = result
    end
end
You might want to move the distributed for to one of the inner loops, but that’s how to parallelize it.
             
            
              
              
              
            
            
           
          
            
            
              Thanks a lot! Since you only put @sync @distributed ONLY for the I iterations, each j and k combination will be conducted on the same core? I.e., there are I parallel jobs.
Can I do the following codes to run IJK parallel jobs?
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    @sync @distributed for j = 1:J
        @sync @distributed k = 1:K
            result = do_one(...)
            results_all[i, j, k] = result
        end
    end
end
             
            
              
              
              
            
            
           
          
            
            
              Another problem is that the result = do_one(parameter1[i], parameter2[j], parameter3[k]) returns a data frame with elements of mixed types, including Float64, String, and Int64. So I cannot use results_all = SharedArray{Int}(I, J, K), then what type I should use here? I cannot do results_all = SharedArray{Any}(I, J, K).
             
            
              
              
              
            
            
           
          
            
            
              instead of Int, perhaps you can put in the type of the dataframe then? At this point you’re probably just going to have to experiment with possibilities since we’re beyond a MWE.
             
            
              
              
              
            
            
           
          
            
            
              Thanks! I think SharedArray may not work for an array with mixed types of data, including Float64, Int64 and String.
There are probably other functions work like SharedArray.
Or I will need to look at pmap instead.