Parallel loops and maps

Jing_Zhang · May 11, 2020, 9:25pm

Dear Julia Users,

I am running multiple layers of for loops. Within the for loops, it is a function named do_one, which will return a 105000×34 DataFrame. I would want to combine these DataFrame to a larger DataFrame. Codes are as below

results_all = Array{Any}(undef, I, J, K)

for i = 1: I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end

My question: how to realize parallel computing with @distributed or pmap?

Thank you so much!

anon94023334 · May 11, 2020, 10:07pm

what are you passing to do_one? Is it i, j, or j, k,? That is, are you getting K dataframes, or are you getting I dataframes?

Also, please enclose your code in triple backticks:

```
code goes here
```

Jing_Zhang · May 11, 2020, 10:37pm

i, j, k are all passed to do_one through “parameters.jl”. Each i, j, k combination will produce a 105000×34 DataFrame, i.e., result in the code. But results_all should be I J K array, with each of its element being a a 105000×34 DataFrame.

results_all = Array{Any}(undef, I, J, K)

@everywhere include("parameters.jl")

for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(…)
            results_all[i, j, k] = result
        end
    end
end

Jing_Zhang · May 11, 2020, 10:49pm

I also should point out that different parameters will be passed to do_one in different i, j, k combinations.

parameters.jl contains information of parameter1, parameter2, parameter3.

So the codes are

results_all = Array{Any}(undef, I, J, K)

@everywhere include("parameters.jl")

for i = 1:I
    for j=1:J
        for k=1:K
            result = do_one(parameter1[i], parameter2[j], parameter3[k])
            results_all[i, j, k] = result
        end
    end
end

anon94023334 · May 11, 2020, 10:50pm

So, I would do something like this. Note that I think you shouldn’t use Any. You should use a real type.

results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    for j = 1:J, k = 1:K
        result = do_one(...)
        results_all[i, j, k] = result
    end
end

You might want to move the distributed for to one of the inner loops, but that’s how to parallelize it.

Jing_Zhang · May 11, 2020, 11:04pm

Thanks a lot! Since you only put @sync @distributed ONLY for the I iterations, each j and k combination will be conducted on the same core? I.e., there are I parallel jobs.

Can I do the following codes to run IJK parallel jobs?

results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I 
    @sync @distributed for j = 1:J
        @sync @distributed k = 1:K
            result = do_one(...)
            results_all[i, j, k] = result
        end
    end
end

Jing_Zhang · May 12, 2020, 12:56am

Another problem is that the result = do_one(parameter1[i], parameter2[j], parameter3[k]) returns a data frame with elements of mixed types, including Float64, String, and Int64. So I cannot use results_all = SharedArray{Int}(I, J, K), then what type I should use here? I cannot do results_all = SharedArray{Any}(I, J, K).

anon94023334 · May 12, 2020, 1:37am

instead of Int, perhaps you can put in the type of the dataframe then? At this point you’re probably just going to have to experiment with possibilities since we’re beyond a MWE.

Jing_Zhang · May 12, 2020, 1:47am

Thanks! I think SharedArray may not work for an array with mixed types of data, including Float64, Int64 and String.

There are probably other functions work like SharedArray.

Or I will need to look at pmap instead.

Topic		Replies	Views
Pmap: does it copy or share arguments across processes? Julia at Scale parallel	9	1555	November 26, 2017
Help with parallelism General Usage question , parallel	9	1678	May 9, 2017
How would I parallelize a for loop that's iterating over columns of a dataframe Performance	5	1420	February 28, 2020
`pmap` with shared arrays in side a custom type? General Usage question	2	913	November 18, 2016
Adding data to worker processes via @everywhere General Usage	12	1241	January 11, 2019

Parallel loops and maps

Related topics