I am trying to setup CPU() backend acceleration as described in ExaModels documentation but it is not working. The optimization runs but on single thread (System monitor detects no multiple threads usage) and with the same solution time as if set with nothing. The code was run with julia -t 4 mwe.jl from command line on Linux. Julia 1.11.3 was used.
function luksan_vlcek_x0(i)
return mod(i, 2) == 1 ? -1.2 : 1.0
end
function luksan_vlcek_model(N, backend = nothing)
c = ExaCore(; backend = backend) # specify the backend
x = variable(c, N; start = (luksan_vlcek_x0(i) for i = 1:N))
constraint(c, luksan_vlcek_con(x, i) for i = 1:N-2)
objective(c, luksan_vlcek_obj(x, i) for i = 2:N)
return ExaModel(c)
end
using ExaModels, NLPModelsIpopt, KernelAbstractions
@odow what is your take on this? Should I put it on issues of ExaModels? It is done as in tutorial, but I don’t see that it works. Does it work for you?
using ExaModels, NLPModelsIpopt, KernelAbstractions
function luksan_vlcek_obj(x, i)
return 100 * (x[i-1]^2 - x[i])^2 + (x[i-1] - 1)^2
end
function luksan_vlcek_con(x, i)
return 3x[i+1]^3 + 2 * x[i+2] - 5 + sin(x[i+1] - x[i+2])sin(x[i+1] + x[i+2]) + 4x[i+1] -
x[i]exp(x[i] - x[i+1]) - 3
end
function luksan_vlcek_x0(i)
return mod(i, 2) == 1 ? -1.2 : 1.0
end
function luksan_vlcek_model(N, backend = nothing)
c = ExaCore(; backend = backend) # specify the backend
x = variable(c, N; start = (luksan_vlcek_x0(i) for i = 1:N))
constraint(c, luksan_vlcek_con(x, i) for i = 1:N-2)
objective(c, luksan_vlcek_obj(x, i) for i = 2:N)
return ExaModel(c)
end
N = parse(Int, ARGS[1])
m = luksan_vlcek_model(N, CPU())
ipopt(m; print_timing_statistics = "yes")
I get
(base) oscar@MacBookPro /tmp % time julia --project=exa -t 4 exa.jl 2000000
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit https://github.com/coin-or/Ipopt
******************************************************************************
This is Ipopt version 3.14.17, running with linear solver MUMPS 5.7.3.
Number of nonzeros in equality constraint Jacobian...: 5999994
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 17999985
Total number of variables............................: 2000000
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 1999998
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 5.0819952e+08 2.48e+01 2.73e+01 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0
1 2.7029936e+08 1.49e+01 8.27e+01 -1.0 2.20e+00 - 1.00e+00 1.00e+00f 1
2 3.0276986e+07 4.28e+00 1.36e+02 -1.0 1.43e+00 - 1.00e+00 1.00e+00f 1
3 1.0576435e+04 3.09e-01 2.18e+01 -1.0 5.63e-01 - 1.00e+00 1.00e+00f 1
4 6.4972350e+00 1.73e-02 8.47e-01 -1.0 2.10e-01 - 1.00e+00 1.00e+00f 1
5 6.2324586e+00 1.15e-05 8.16e-04 -1.7 3.35e-03 - 1.00e+00 1.00e+00h 1
6 6.2324586e+00 8.36e-12 7.97e-10 -5.7 2.00e-06 - 1.00e+00 1.00e+00h 1
Number of Iterations....: 6
(scaled) (unscaled)
Objective...............: 7.8692659500479645e-01 6.2324586324379885e+00
Dual infeasibility......: 7.9743417311241623e-10 6.3156786510503368e-09
Constraint violation....: 8.3555384833289281e-12 8.3555384833289281e-12
Variable bound violation: 0.0000000000000000e+00 0.0000000000000000e+00
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 7.9743417311241623e-10 6.3156786510503368e-09
Number of objective function evaluations = 7
Number of objective gradient evaluations = 7
Number of equality constraint evaluations = 7
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 7
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 6
Total seconds in IPOPT (w/o function evaluations) = 25.122
Total seconds in NLP function evaluations = 2.045
Timing Statistics:
OverallAlgorithm....................: 27.635 (sys: 3.073 wall: 27.167)
PrintProblemStatistics.............: 0.013 (sys: 0.002 wall: 0.015)
InitializeIterates.................: 7.549 (sys: 1.030 wall: 7.245)
UpdateHessian......................: 2.073 (sys: 0.230 wall: 1.132)
OutputIteration....................: 0.000 (sys: 0.000 wall: 0.000)
UpdateBarrierParameter.............: 0.000 (sys: 0.000 wall: 0.000)
ComputeSearchDirection.............: 15.788 (sys: 1.700 wall: 17.506)
ComputeAcceptableTrialPoint........: 0.517 (sys: 0.016 wall: 0.291)
AcceptTrialPoint...................: 0.000 (sys: 0.013 wall: 0.013)
CheckConvergence...................: 1.694 (sys: 0.083 wall: 0.965)
PDSystemSolverTotal.................: 15.769 (sys: 1.675 wall: 17.462)
PDSystemSolverSolveOnce............: 14.878 (sys: 1.604 wall: 16.497)
ComputeResiduals...................: 0.778 (sys: 0.013 wall: 0.793)
StdAugSystemSolverMultiSolve.......: 19.614 (sys: 2.267 wall: 21.907)
LinearSystemScaling................: 0.000 (sys: 0.000 wall: 0.000)
LinearSystemSymbolicFactorization..: 2.784 (sys: 0.357 wall: 3.148)
LinearSystemFactorization..........: 0.000 (sys: 0.000 wall: 0.000)
LinearSystemBackSolve..............: 6.091 (sys: 0.077 wall: 6.173)
LinearSystemStructureConverter.....: 0.000 (sys: 0.000 wall: 0.000)
LinearSystemStructureConverterInit: 0.000 (sys: 0.000 wall: 0.000)
QualityFunctionSearch...............: 0.000 (sys: 0.000 wall: 0.000)
TryCorrector........................: 0.000 (sys: 0.000 wall: 0.000)
Task1...............................: 0.000 (sys: 0.000 wall: 0.000)
Task2...............................: 0.000 (sys: 0.000 wall: 0.000)
Task3...............................: 0.000 (sys: 0.000 wall: 0.000)
Task4...............................: 0.000 (sys: 0.000 wall: 0.000)
Task5...............................: 0.000 (sys: 0.000 wall: 0.000)
Task6...............................: 0.000 (sys: 0.000 wall: 0.000)
Function Evaluations................: 4.081 (sys: 0.264 wall: 2.045)
Objective function.................: 0.298 (sys: 0.015 wall: 0.184)
Objective function gradient........: 0.184 (sys: 0.001 wall: 0.068)
Equality constraints...............: 0.793 (sys: 0.017 wall: 0.412)
Inequality constraints.............: 0.000 (sys: 0.000 wall: 0.000)
Equality constraint Jacobian.......: 0.733 (sys: 0.002 wall: 0.249)
Inequality constraint Jacobian.....: 0.000 (sys: 0.000 wall: 0.000)
Lagrangian Hessian.................: 2.073 (sys: 0.230 wall: 1.132)
EXIT: Optimal Solution Found.
julia --project=exa -t 4 exa.jl 2000000 35.18s user 4.06s system 111% cpu 35.155 total
So perhaps there’s just little to be gained from multithreading the callback oracles in this example. Most of the time is spent by Ipopt doing other things.
As a side note: in general there’s no need to tag people directly.
Function Evaluations…: 6.156 (sys: 0.045 wall: 3.163)
CPU() seems to do something, but it is not favourable. Good I did not open an issue for this. Still, it would be nice to have example in the docs where the use case is justified.
I’m sure they’d appreciate a PR to improve the docs. But it’s probably quite hard to find an example that is trivial enough to write as a tutorial while being meaningfully faster with CPU() and can be built quickly enough in the CI machine when the docs run.