Hi,
I’m wondering whether it’s possible to combine parallelization inside the cost function with the parallel evaluation across the CMA-ES population (CMAEvolutionStrategy.jl).
Here’s my situation:
I’m optimizing model parameters using CMA-ES. Inside my cost function, the model performs 17 independent site-level runs (each using the same parameter vector), and then aggregates their results into a single scalar loss. So, there are two potential levels of parallelism:
- Across population members (already supported via
parallel_evaluation = true), and - Within each cost function evaluation, across the 17 model runs.
Currently, the CMA-ES CMAEvolutionStrategy.jl package parallelizes across the population. However, it would be much faster if I could exploit both levels of parallelism simultaneously — e.g., multithreading (or distributed) across population members and, within each, further parallelize over my 17 runs.
Here’s a simplified example illustrating the idea:
function single_model(x, input)
n = length(x)
return sum(100 * (x[2i-1]^2 - x[2i])^2 + (x[2i-1] - 1)^2 for i in 1:div(n, 2)) + sum(input)
end
function my_model(x, total_input)
results = Vector{Float64}(undef, length(total_input))
for (i, input) in enumerate(total_input)
y = single_model(x, input)
results[i] = y
end
return sum(results)
end
Possible @threads parallelization of this model running can be done like
function my_model(x, total_input)
results = Vector{Float64}(undef, length(total_input))
Threads.@threads for (i, input) in enumerate(total_input)
y = single_model(x, input)
results[i] = y
end
return sum(results)
end
Right now I am using this way to parallelize across the population size ():
population_size = Threads.nthreads()
parameter_vector = Vector{Float64}(undef, population_size)
or
population_size = 4 + floor(Int, 3 * log(length(parameter_vector)))
parameter_vector = Vector{Float64}(undef, population_size)
and feed parameter_vector into CMAES optimizer by If parallel_evaluation = true, the objective function f receives matrices of n rows (n = length(x0)) and popsize columns and should return a vector of length popsize. from GitHub - jbrea/CMAEvolutionStrategy.jl: A julia implementation of the CMA Evolution Strategy for derivative-free optimization of potentially non-linear, non-convex or noisy functions over continuous domains..
You can see that the for loop in my_model (over total_input) represents the 17 independent runs. I’m wondering if there’s a clean way to parallelize both this inner loop and the CMA-ES population evaluations at the same time, without the two interfering with each other.
Any guidance or example would be greatly appreciated!
Thanks a lot!