Parallel loops and maps

Dear Julia Users,

I am running multiple layers of for loops. Within the for loops, it is a function named do_one, which will return a 105000×34 DataFrame. I would want to combine these DataFrame to a larger DataFrame. Codes are as below

results_all = Array{Any}(undef, I, J, K)

for i = 1: I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end

My question: how to realize parallel computing with @distributed or pmap?

Thank you so much!

what are you passing to do_one? Is it i, j, or j, k,? That is, are you getting K dataframes, or are you getting I dataframes?

Also, please enclose your code in triple backticks:

```
code goes here
```

i, j, k are all passed to do_one through “parameters.jl”. Each i, j, k combination will produce a 105000×34 DataFrame, i.e., result in the code. But results_all should be I J K array, with each of its element being a a 105000×34 DataFrame.

results_all = Array{Any}(undef, I, J, K)

@everywhere include("parameters.jl")

for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(…)
            results_all[i, j, k] = result
        end
    end
end

I also should point out that different parameters will be passed to do_one in different i, j, k combinations.

parameters.jl contains information of parameter1, parameter2, parameter3.

So the codes are

results_all = Array{Any}(undef, I, J, K)

@everywhere include("parameters.jl")

for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(parameter1[i], parameter2[j], parameter3[k])
            results_all[i, j, k] = result
        end
    end
end

So, I would do something like this. Note that I think you shouldn’t use Any. You should use a real type.

results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    for j = 1:J, k = 1:K
        result = do_one(...)
        results_all[i, j, k] = result
    end
end

You might want to move the distributed for to one of the inner loops, but that’s how to parallelize it.

Thanks a lot! Since you only put @sync @distributed ONLY for the I iterations, each j and k combination will be conducted on the same core? I.e., there are I parallel jobs.

Can I do the following codes to run IJK parallel jobs?

results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    @sync @distributed for j = 1:J
        @sync @distributed k = 1:K
            result = do_one(...)
            results_all[i, j, k] = result
        end
    end
end

Another problem is that the result = do_one(parameter1[i], parameter2[j], parameter3[k]) returns a data frame with elements of mixed types, including Float64, String, and Int64. So I cannot use results_all = SharedArray{Int}(I, J, K), then what type I should use here? I cannot do results_all = SharedArray{Any}(I, J, K).

instead of Int, perhaps you can put in the type of the dataframe then? At this point you’re probably just going to have to experiment with possibilities since we’re beyond a MWE.

Thanks! I think SharedArray may not work for an array with mixed types of data, including Float64, Int64 and String.

There are probably other functions work like SharedArray.

Or I will need to look at pmap instead.