I am trying to parallelize the code below. The idea is to run several model runs in parallel and push their output in to a single dataframe (i.e. dout). The code I have created works with no parrallelisation so far. However, I would be interested in making it faster as at the end I would have more than 1000 model runs. Any idea how I could parralelize this piece of code.
dout=DataFrame[]
for i= 1:l10
model=train(y,x);
ynew= predict(model, x[i)
Prediction_df = DataFrame(Prediction=mapreduce(vec, vcat, ynew))
push!(dout,DataFrame(Prediction_df)
end
I don’t want to sound too harsh, but you really need to improve your question if you want a decent answer. I will give you three reasons why I say this:
- The provided example does not give any indication of what’s going on, other than there being a loop where you train, predict, and map-reduce each iteration. It’s hardly enough information for me to say anything other than: “Run it in a @threads for loop”. This is especially true when trying to convert serial code to parallel, as there are a huge range of factors to consider.
- You need to give a more elaborate answer which indicates what
train
and predict
do and where they come from, so we can inspect those functions and their containing modules/libraries. Or even better, boil your example down into proper MWE (Minimum Working Example) that anyone seeing your answer can run on their local machine.
- The example itself does not even run, even if all functions being used were already known and defined. I see errors on lines 2, 4, and 6 which would make your code non-runnable. While this may sound like me nitpicking, proper syntax that “just works” for people trying to assist you goes a very long way.
I would recommend you either edit your original question, or create a new one, with the above 3 points addressed. Only then will people be willing and able to assist you.
1 Like
I think you can use pmap
here, you need to write a function that takes a x
and return your dataframe and then map x on it with pmap
, here’s a minimal example:
addprocs(2)
x = rand(10)
@everywhere begin
using DataFrames
model(x,other_para) = DataFrame(val=x+other_para,id=myid())
end
pmap(x->model(x,1),x)
https://docs.julialang.org/en/latest/manual/parallel-computing#Parallel-Map-and-Loops-1
Threads.@threads might also just work, something like:
Threads.@threads for i=1:10
out[i] = somefun(x[i])
end
But you need to start Julia with several threads, and it can make Julia crash if you read files or things like that (look up the docs).