Hi
I have a UNet training for Machine Learning which works on my laptop (14 core) but its extremely slow.
I’m trying to explore ways of speeding it up and came across Distributed library.
So I’ve only added the following three lines in my code:
using Distributed
addprocs(exeflags=`--project=$(Base.active_project())`)
..
rmprocs(workers)
Also, It doesn’t identify pmap() function which I’ve called like this :
train_batch_input_files, train_batch_target_files = pmap(grab_random_files, train_dataset, batch_size)
I’m not sure If i’m actually using distributed computing correctly or not?
It looks like you are just trying to load the files in parallel? If you are loading from disk, this will likely be bottlenecked by your disk speed, and not how many cores you have.
pmap is likely being defined, but maybe it is called with the wrong types. The main way to use pmap is:
So that each element in the array is mapped using the supplied function into a result, similarly to how one uses broadcasting.
In your example, the pmap should return an array of results, with each element being the return type of your supplied function. In this case it looks like a tuple. This means you will get an array of tuples as the return type, so I don’t think you can simply destructure that. Secondly, if the batch size parameter is fed into your function, but is not an array, you can create an anonymous function which wraps this parameter:
If you are trying to get your code to run faster, I would recommend profiling the code first to see which parts are taking the most time, and focus on optimising them first.
EDIT: Use the @everywhere macro to load any non base functions used in your mapping functions. I usually have a separate file with all necessary function definitions and have an @everywhere include("functions.jl") before I use pmap.
So instead of using pmap, i used @sync and @distributed before the for loop. In the call to environment i used @everywhere for all process to have access to all using files and now the run time has reduced. from 145 to 110 seconds.
Since this is just on a single machine, multithreading is likely a better fit here as it has very little overhead (Multi-Threading · The Julia Language). Here you would just replace the pmap and not need the @everything:
Threads.@threads for file in input_files
Jaws.transfer(file, pwd())
end
Distributed uses different processes with separate memory so one needs to load all the libraries on each process, whereas multithreading uses shared memory. If I am on a single machine, I tend to try multithreading first, and only move to distributed if there is a particular benefit.
Just make sure you have multiple threads available with Threads.nthreads(). A sensible value for this is the number of logical cores in your CPU (14). The docs tells you how to change this.