Run model runs in a parallel for loop in julia


#1

I am trying to parallelize the code below. The idea is to run several model runs in parallel and push their output in to a single dataframe (i.e. dout). The code I have created works with no parrallelisation so far. However, I would be interested in making it faster as at the end I would have more than 1000 model runs. Any idea how I could parralelize this piece of code.

dout=DataFrame[]
for i= 1:l10
model=train(y,x);
ynew= predict(model, x[i)
Prediction_df = DataFrame(Prediction=mapreduce(vec, vcat, ynew))
push!(dout,DataFrame(Prediction_df)
end

#2

I don’t want to sound too harsh, but you really need to improve your question if you want a decent answer. I will give you three reasons why I say this:

  1. The provided example does not give any indication of what’s going on, other than there being a loop where you train, predict, and map-reduce each iteration. It’s hardly enough information for me to say anything other than: “Run it in a @threads for loop”. This is especially true when trying to convert serial code to parallel, as there are a huge range of factors to consider.
  2. You need to give a more elaborate answer which indicates what train and predict do and where they come from, so we can inspect those functions and their containing modules/libraries. Or even better, boil your example down into proper MWE (Minimum Working Example) that anyone seeing your answer can run on their local machine.
  3. The example itself does not even run, even if all functions being used were already known and defined. I see errors on lines 2, 4, and 6 which would make your code non-runnable. While this may sound like me nitpicking, proper syntax that “just works” for people trying to assist you goes a very long way.

I would recommend you either edit your original question, or create a new one, with the above 3 points addressed. Only then will people be willing and able to assist you.


#3

I think you can use pmap here, you need to write a function that takes a x and return your dataframe and then map x on it with pmap, here’s a minimal example:

addprocs(2)

x = rand(10)
@everywhere begin
    using DataFrames
    model(x,other_para) = DataFrame(val=x+other_para,id=myid())
end

pmap(x->model(x,1),x)

https://docs.julialang.org/en/latest/manual/parallel-computing#Parallel-Map-and-Loops-1

Threads.@threads might also just work, something like:

Threads.@threads for i=1:10
    out[i] = somefun(x[i])
end

But you need to start Julia with several threads, and it can make Julia crash if you read files or things like that (look up the docs).