Hello, new here. Sorry if I make any mistakes.
I am currently trying to train a Neural Network to approximate a function. However it is not fast enough, and I think the most obvious gains can come from parallelizing.
See this other discussion post to see the model I am trying to solve (though my code is different)
So a couple questions. First, how should I parallelize my training in order speed up my code the most? I plan on running K = {10,50,100, unsure} Neural Networks at a time. Should I parallelize by running a Neural Network on each core until it converges or reaches some max epoch ? Should I parallelize by distributing the “update step” (which I am doing via minibatch stochastic gradient descent and using ADAM). Should I parallelize by distributing the gradient step? Let me know if you have any thoughts.
Second of all, I am really struggling to figure out how to parallelize this. I have previously parallelized by using SharedArrays. However, SharedArrays needs to be of type bits and I am not really working with bits.
- So I am using ADAM optimizer in Flux. If my Neural Network updating step is done inside the parallelization, I need to make the optimizer a SharedArray I believe, but I can’t since it is of type ADAM.
- If I do my parallelization before I update my Neural Networks, then I’m thinking I’ll use it to grab all the gradients of my loss function. Well those are of type Zygote.Grads, so those can’t be in SharedArray form. If I convert those gradients to Array type, then I can store them in a SharedArray, but then I dont know how to use “Flux.Optimise.update!” since my gradient is no longer in the Zygote.Grads type.
I hope this makes sense! Please ask if it doesn’t and thank you for any help! Have spent quite some time on this and I am really struggling.