Hi, for the MPI transport option, where only MPI can be used for communication on the cluster, I’ve gotten the example from MPIClusterManagers.jl working well for my use case. However, I would like to add a heap size hint, because for very long runs, I run into out of memory problems. For addprocs()
, this can be done with something like
addprocs(960, exeflags=`--heap-size-hint=3G`) # 3GB per worker hint
but how in the MPI transport case, do I do this as there is no explicit addprocs()
?
My setup is to call within qsub
on PBS
mpirun -np 960 myprogram.jl
where myprogram.jl
is of the form
using MPIClusterManagers, Distributed
import MPI
MPI.Init()
rank = MPI.Comm_rank(MPI.COMM_WORLD)
sz = MPI.Comm_size(MPI.COMM_WORLD)
if rank == 0
@info "size is $sz"
end
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
@info "there are $(nworkers()) workers"
include("01_read_data.jl")
include("02_set_options.jl")
## Do stuff as normal with Distributed
@everywhere using Distributed
@everywhere using MyModule
# @sync for w in workers()
# @async remotecall_fetch(myfunc, w, myopts...) # returns nothing on completion
# end
## exit gracefully
MPIClusterManagers.stop_main_loop(manager)
rmprocs(workers())
exit()
Many thanks for any hints or experience with this situation.