Dear Julia Users,
I am running multiple layers of for loops. Within the for loops, it is a function named do_one, which will return a 105000×34 DataFrame. I would want to combine these DataFrame to a larger DataFrame. Codes are as below
results_all = Array{Any}(undef, I, J, K)
for i = 1: I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end
My question: how to realize parallel computing with @distributed or pmap?
Thank you so much!
what are you passing to do_one
? Is it i, j
, or j, k
,? That is, are you getting K
dataframes, or are you getting I
dataframes?
Also, please enclose your code in triple backticks:
```
code goes here
```
i, j, k are all passed to do_one through “parameters.jl”. Each i, j, k combination will produce a 105000×34 DataFrame, i.e., result in the code. But results_all should be I J K array, with each of its element being a a 105000×34 DataFrame.
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end
I also should point out that different parameters will be passed to do_one in different i, j, k combinations.
parameters.jl contains information of parameter1, parameter2, parameter3.
So the codes are
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
for j=1:J
for k=1:K
result = do_one(parameter1[i], parameter2[j], parameter3[k])
results_all[i, j, k] = result
end
end
end
So, I would do something like this. Note that I think you shouldn’t use Any
. You should use a real type.
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I
for j = 1:J, k = 1:K
result = do_one(...)
results_all[i, j, k] = result
end
end
You might want to move the distributed for to one of the inner loops, but that’s how to parallelize it.
Thanks a lot! Since you only put @sync @distributed ONLY for the I iterations, each j and k combination will be conducted on the same core? I.e., there are I parallel jobs.
Can I do the following codes to run IJK parallel jobs?
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I
@sync @distributed for j = 1:J
@sync @distributed k = 1:K
result = do_one(...)
results_all[i, j, k] = result
end
end
end
Another problem is that the result = do_one(parameter1[i], parameter2[j], parameter3[k])
returns a data frame with elements of mixed types, including Float64, String, and Int64. So I cannot use results_all = SharedArray{Int}(I, J, K)
, then what type I should use here? I cannot do results_all = SharedArray{Any}(I, J, K)
.
instead of Int
, perhaps you can put in the type of the dataframe then? At this point you’re probably just going to have to experiment with possibilities since we’re beyond a MWE.
Thanks! I think SharedArray
may not work for an array with mixed types of data, including Float64, Int64 and String.
There are probably other functions work like SharedArray
.
Or I will need to look at pmap
instead.