Dear Julia Users,

I am running multiple layers of for loops. Within the for loops, it is a function named do_one, which will return a 105000×34 DataFrame. I would want to combine these DataFrame to a larger DataFrame. Codes are as below

results_all = Array{Any}(undef, I, J, K)

for i = 1: I

for j=1:J

for k=1:K

result = do_one(…)

results_all[i, j, k] = result

end

end

end

My question: how to realize parallel computing with @distributed or pmap?

Thank you so much!

what are you passing to `do_one`

? Is it `i, j`

, or `j, k`

,? That is, are you getting `K`

dataframes, or are you getting `I`

dataframes?

Also, please enclose your code in triple backticks:

```

`code goes here`

```

i, j, k are all passed to do_one through “parameters.jl”. Each i, j, k combination will produce a 105000×34 DataFrame, i.e., result in the code. But results_all should be I *J* K array, with each of its element being a a 105000×34 DataFrame.

```
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
for j=1:J
for k=1:K
result = do_one(…)
results_all[i, j, k] = result
end
end
end
```

I also should point out that different parameters will be passed to do_one in different i, j, k combinations.

parameters.jl contains information of parameter1, parameter2, parameter3.

So the codes are

```
results_all = Array{Any}(undef, I, J, K)
@everywhere include("parameters.jl")
for i = 1:I
for j=1:J
for k=1:K
result = do_one(parameter1[i], parameter2[j], parameter3[k])
results_all[i, j, k] = result
end
end
end
```

So, I would do something like this. Note that I think you shouldn’t use `Any`

. You should use a real type.

```
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I
for j = 1:J, k = 1:K
result = do_one(...)
results_all[i, j, k] = result
end
end
```

You might want to move the distributed for to one of the inner loops, but that’s how to parallelize it.

Thanks a lot! Since you only put @sync @distributed ONLY for the I iterations, each j and k combination will be conducted on the same core? I.e., there are I parallel jobs.

Can I do the following codes to run I*J*K parallel jobs?

```
results_all = SharedArray{Int}(I, J, K)
@sync @distributed for i = 1:I
@sync @distributed for j = 1:J
@sync @distributed k = 1:K
result = do_one(...)
results_all[i, j, k] = result
end
end
end
```

Another problem is that the `result = do_one(parameter1[i], parameter2[j], parameter3[k])`

returns a data frame with elements of mixed types, including Float64, String, and Int64. So I cannot use `results_all = SharedArray{Int}(I, J, K)`

, then what type I should use here? I cannot do `results_all = SharedArray{Any}(I, J, K)`

.

instead of `Int`

, perhaps you can put in the type of the dataframe then? At this point you’re probably just going to have to experiment with possibilities since we’re beyond a MWE.

Thanks! I think `SharedArray`

may not work for an array with mixed types of data, including Float64, Int64 and String.

There are probably other functions work like `SharedArray`

.

Or I will need to look at `pmap`

instead.