I have some expensive function that I’m executing on local workers, something like:
@everywhere begin
using LinearAlgebra
BLAS.set_num_threads(1)
function expensive_fun()
#some in-homogeneous task
end
end
pmap(_ -> expensive_fun(), 1:N)
Since this is not homogeneous, if I start with 10 workers, I might end up with 2 after 10 minutes, while the remaining 2 might take other 60 minutes. Is there a way to redistribute the processors used initially as workers on the 2 remaining tasks as blas threads?
minimal working example
@everywhere begin
using LinearAlgebra
using ITensors, ITensorMPS
using Random
Random.seed!(1234)
BLAS.set_num_threads(1)
const sites = siteinds("S=1/2", 100)
const mpo = random_mpo(sites)
function random_evolve()
linkdim = rand([fill(256, 7)..., fill(2048, 3)...])
@info "evolving with" linkdim
mps = random_mps(sites; linkdims=linkdim)
apply(mpo, mps; cutoff=1.e-13)
end
end
pmap(_ -> random_evolve(), 1:10)