AC Optimal Power Flow in Various Nonlinear Optimization Frameworks

ccoffrin · July 25, 2023, 4:25pm

When it comes to AC-OPF faster is always better and more useful but <1h is interesting and useful for “day-ahead” operations; <10 minutes is required for things like “real-time” operations.

Current state of the art for this case, as far as I know, is OPOMO at 303 seconds.

ChrisRackauckas · August 31, 2023, 12:07pm

I have a PR to get it onto the SciMLBenchmarks system so that way it can be auto-run on a standard machine through CI, making it easier for everyone to get involved:

I noticed Optimization.jl is using ForwardDiff.jl and no sparsity. These days, it has Enzyme with sparse Hessians. I presume that would make a major difference, and so after this merges @Vaibhavdixit02 should look into adding that.

odow · August 31, 2023, 6:57pm

From memory we tried the various sparsity and AD options and ran into memory issues. This was the fastest, but perhaps recent versions are better. for putting this in the benchmark suite so you have control over what options are used etc.

ChrisRackauckas · August 31, 2023, 7:15pm

Avik’s new and improved sparse interface which makes sparse reverse mode simpler only merged a week ago New High Level Interface for Sparse Jacobian Computation by avik-pal · Pull Request #253 · JuliaDiff/SparseDiffTools.jl · GitHub so I presume it wasn’t the latest stuff. Not all of this has made it to Optimization.jl’s high level AD system yet, but by making this be the benchmark we can do a benchmark-driven development to ensure what’s merged is efficient. I hate building in a vacuum.

ChrisRackauckas · September 13, 2023, 8:40pm

Update: the benchmark is now running on the SciMLBenchmarks! You can find it here:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/OptimizationFrameworks/optimal_powerflow/

Anyone can trigger the benchmark to run just by changing the code here:

github.com

SciML/SciMLBenchmarks.jl/blob/master/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd

---
title: Optimal Powerflow Nonlinear Optimization Benchmark
author: Chris Rackauckas
---

## Data Load and Setup Code

This is generic setup code usable for all solver setups. Basically removing some unnecessary untyped dictionaries
before getting to the benchmarks.

```julia
PRINT_LEVEL = 0

# This is a soft upper limit to the time of each optimization.
# If times go above this, they will halt early
MAX_CPU_TIME = 100.0

# Maximum number of variables in an optimization problem for the benchmark
# Anything with more variables is rejected and not run
# This is for testing. 100 is a good size for running a test of changes

This file has been truncated. show original

It makes a cute table at the end with the current results:

Things to note:

I know the table formatting can be improved. If anyone knows how to make documenter show the whole thing I’d be happy to have the help in formatting it.
There is a parameter SIZE_LIMIT which tells the benchmark the maximum number of variables to use in the benchmark files. This is set to 100 right now just to play around with formatting, but I’ll bump it up and do a bigger run. I did do a bigger run a bit back so I know it works, just when fixing table formatting it’s annoying to wait a day haha.
Optimization.jl code was horrendously allocating and type-unstable. A common issue in the code was using d["$i"] on a dictionary, instead of just using an integer. The dictionaries are still there, so there’s still some indirection that should be removed and the performance should increase, but at least it’s type-stable and non-allocating now. I’ll actually optimize the code in the near future, but the code now is at least a lot simpler than the original code and won’t interfere with AD tooling.
Optimization.jl is now using a sparse reverse mode, but it’s not optimized yet. This case is a case we’re using to optimize things. See:

which are low hanging fruit identified from this.

I added a lot of validation cases so if you change the code around, assertions will throw failures if you break something.
I couldn’t get the CASADI code to run, I’d like some help figuring that out.

brianguenter · September 14, 2023, 12:33am

This looks like a problem that FastDifferentiation.jl can solve.

It handles sparsity well and is likely to be faster than the alternatives for this type of problem. If you decide to try it I’ll be happy to help you get it working.

ChrisRackauckas · September 14, 2023, 10:33am

@brianguenter we have it setup to MTK trace already and it produces a branch-free code, so once the Symbolics integration is done this was going to be one of the first cases I wanted to test.

ccoffrin · January 12, 2024, 3:09am

Given the release of a new nonlinear modeling interface in JuMP v1.15, I thought it would be good to update rosseta-opf to use that new interface and check these results.

The JuMP results are largely unchanged with an arguable case for a small performance improvements at the largest scales. No notable performance regression is a good outcome for the new nonlinear modeling interface, which has been rebuilt from scratch for improved maintainability and added support for a variety of new features.

In updating to more recent versions of all related packages and Julia, Optimization.jl also saw a notable improvement, now solving up to 793 bus case for the first time. Nonconvex.jl also solved a 200 bus case for the first time. NLPModels encountered some new exceptions in derivative calculations on large cases.

By and large, there has been notable progress across the board!

Case	Vars	Cons	JuMP	NLPModels	NonConvex	Optim	Optimization
case3_lmbd	24	28	1.47e-02	4.82e-02	8.42e+00	INF.	2.62e+00
case5_pjm	44	53	1.78e-02	8.63e-02	1.76e+00	INF.	3.96e-01
case14_ieee	118	169	3.45e-01	3.18e-01	7.84e+01	N.D.	7.75e+00
case24_ieee_rts	266	315	3.69e-01	7.81e-01	2.38e+02	INF.	1.77e+01
case30_ieee	236	348	3.74e-01	7.84e-01	2.41e+02	N.D.	1.76e+01
case30_as	236	348	3.33e-01	6.68e-01	2.49e+02	INF.	1.77e+01
case39_epri	282	401	9.70e-02	6.81e-01	3.28e+02	N.D.	2.18e+01
case57_ieee	448	675	4.01e-01	1.15e+00	5.82e+02	N.D.	4.24e+01
case60_c	518	737	4.14e-01	1.61e+00	5.29e+02	N.D.	4.52e+01
case73_ieee_rts	824	987	4.45e-01	2.69e+00	1.22e+03	N.D.	9.43e+01
case89_pegase	1042	1649	6.33e-01	6.70e+00	3.96e+03	N.D.	2.23e+02
case118_ieee	1088	1539	6.35e-01	4.44e+00	3.05e+03	N.D.	1.79e+02
case162_ieee_dtc	1484	2313	6.39e-01	6.21e+00	N.D.	N.D.	3.95e+02
case179_goc	1468	2200	7.24e-01	9.49e+00	5.67e+03	N.D.	3.01e+02
case197_snem	1608	2397	6.84e-01	6.76e+00	N.D.	N.D.	4.29e+02
case200_activ	1456	2116	5.41e-01	5.29e+00	6.49e+03	N.D.	2.92e+02
case240_pserc	2558	3617	2.76e+00	4.90e+01	N.D.	N.D.	9.65e+02
case300_ieee	2382	3478	9.57e-01	1.23e+01	N.D.	N.D.	8.37e+02
case500_goc	4254	6097	1.40e+00	2.98e+01	N.D.	N.D.	3.11e+03
case588_sdet	4110	5979	1.34e+00	2.31e+01	N.D.	N.D.	2.58e+03
case793_goc	5432	7978	1.69e+00	4.00e+01	N.D.	N.D.	4.94e+03
case1354_pegase	11192	16646	4.89e+00	1.68e+02	N.D.	N.D.	N.D.
case1803_snem	15246	23172	8.23e+00	3.37e+02	N.D.	N.D.	N.D.
case1888_rte	14480	21494	1.98e+01	5.18e+02	N.D.	N.D.	N.D.
case1951_rte	15018	22075	9.86e+00	N.D.	N.D.	N.D.	N.D.
case2000_goc	19008	29432	7.82e+00	3.93e+02	N.D.	N.D.	N.D.
case2312_goc	17128	25716	7.27e+00	2.92e+02	N.D.	N.D.	N.D.
case2383wp_k	17004	25039	7.84e+00	2.68e+02	N.D.	N.D.	N.D.
case2736sp_k	19088	28356	7.03e+00	3.15e+02	N.D.	N.D.	N.D.
case2737sop_k	18988	28358	6.40e+00	N.D.	N.D.	N.D.	N.D.
case2742_goc	24540	38196	2.77e+01	1.16e+03	N.D.	N.D.	N.D.
case2746wp_k	19520	28446	6.81e+00	2.90e+02	N.D.	N.D.	N.D.
case2746wop_k	19582	28642	6.55e+00	2.94e+02	N.D.	N.D.	N.D.
case2848_rte	21822	32129	1.50e+01	6.11e+02	N.D.	N.D.	N.D.
case2853_sdet	23028	33154	1.08e+01	7.18e+02	N.D.	N.D.	N.D.
case2868_rte	22090	32393	1.66e+01	6.69e+02	N.D.	N.D.	N.D.
case2869_pegase	25086	37813	1.25e+01	8.14e+02	N.D.	N.D.	N.D.
case3012wp_k	21082	31029	9.96e+00	4.17e+02	N.D.	N.D.	N.D.
case3022_goc	23238	34990	1.16e+01	N.D.	N.D.	N.D.	N.D.
case3120sp_k	21608	32092	1.01e+01	4.35e+02	N.D.	N.D.	N.D.
case3375wp_k	24350	35876	1.17e+01	N.D.	N.D.	N.D.	N.D.
case3970_goc	35270	54428	1.67e+01	2.02e+03	N.D.	N.D.	N.D.
case4020_goc	36696	56957	2.22e+01	N.D.	N.D.	N.D.	N.D.
case4601_goc	38814	59596	2.36e+01	2.38e+03	N.D.	N.D.	N.D.
case4619_goc	42532	66289	2.10e+01	2.44e+03	N.D.	N.D.	N.D.
case4661_sdet	34758	51302	1.72e+01	1.50e+03	N.D.	N.D.	N.D.
case4837_goc	41398	64030	2.25e+01	2.25e+03	N.D.	N.D.	N.D.
case4917_goc	37872	56917	2.04e+01	N.D.	N.D.	N.D.	N.D.
case5658_epigrids	48552	74821	2.40e+01	3.37e+03	N.D.	N.D.	N.D.
case6468_rte	49734	75937	6.32e+01	4.10e+03	N.D.	N.D.	N.D.
case6470_rte	50482	75976	3.64e+01	3.56e+03	N.D.	N.D.	N.D.
case6495_rte	50426	76124	6.38e+01	4.39e+03	N.D.	N.D.	N.D.
case6515_rte	50546	76290	5.21e+01	4.19e+03	N.D.	N.D.	N.D.
case7336_epigrids	62116	95306	2.90e+01	5.42e+03	N.D.	N.D.	N.D.
case8387_pegase	78748	118702	5.03e+01	N.D.	N.D.	N.D.	N.D.
case9241_pegase	85568	130826	5.30e+01	N.D.	N.D.	N.D.	N.D.
case9591_goc	83572	130588	6.02e+01	N.D.	N.D.	N.D.	N.D.
case10000_goc	76804	112352	5.11e+01	N.D.	N.D.	N.D.	N.D.
case10192_epigrids	89850	139456	5.81e+01	N.D.	N.D.	N.D.	N.D.
case10480_goc	96750	150874	6.87e+01	N.D.	N.D.	N.D.	N.D.
case13659_pegase	117370	170588	6.82e+01	N.D.	N.D.	N.D.	N.D.
case19402_goc	179562	281733	1.49e+02	N.D.	N.D.	N.D.	N.D.
case20758_epigrids	179236	274918	1.04e+02	N.D.	N.D.	N.D.	N.D.
case24464_goc	203374	313641	1.27e+02	N.D.	N.D.	N.D.	N.D.
case30000_goc	208624	307752	2.49e+02	N.D.	N.D.	N.D.	N.D.
case78484_epigrids	674562	1039062	7.77e+02	N.D.	N.D.	N.D.	N.D.

Package Details

Julia v1.10.0
[54578032] ADNLPModels v0.7.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.0
[4076af6c] JuMP v1.18.1
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.7.8
[7f7a1694] Optimization v3.21.0
[fd9f6733] OptimizationMOI v0.3.0
[c36e90e8] PowerModels v0.20.1
[0c5d862f] Symbolics v5.15.1

Ipopt was run with the linear solver HSL ma27.

ChrisRackauckas · January 12, 2024, 8:15am

Your “results” make no sense. I can see from the reproducible continuous benchmarking machine that this differs by orders of magnitude against the code that is considered canonical here.

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/OptimizationFrameworks/optimal_powerflow/

case	vars	cons	optimization	optimization_modelbuild	optimization_wcompilation	optimization_cost	mtk	mtk_time_modelbuild	mtk_time_wcompilation	mtk_cost	jump	jump_modelbuild	jump_wcompilation	jump_cost	nlpmodels	nlpmodels_modelbuild	nlpmodels_wcompilation	nlpmodels_cost	optim	optim_modelbuild	optim_wcompilation	optim_cost
String	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
pglib_opf_case3_lmbd.m	24	28	0 .0190105	0.0434296	0.0209169	5812.64	0.0123753	0.392407	0.0142627	743.0	0.00895457	0.00250657	0.00930726	5812.64	0.0185664	0.0154271	0.0194362	5812.64	2.20767	0.000411777	2.05422	6273.63

Given that you’re two orders of magnitude away from the continuous reproducible version of the benchmarks, I don’t know if the numbers are just made up or wrong but it is clearly doesn’t pass the smell test.

odow · January 12, 2024, 8:26am

Let me take a look. @ccoffrin has been using rosetta-opf/optimization.jl at main · lanl-ansi/rosetta-opf · GitHub. It doesn’t include the changes made in the benchmark repo yet.

odow · January 12, 2024, 8:54am

FWIW, in the case73_ieee_rts, @ccoffrin report 9.43e+01, and and Optimal Powerflow Nonlinear Optimization Benchmark · The SciML Benchmarks reports 6.44e+01 (optimization_modelbuild + optimization), so your implementation is much better, but they’re not two orders of magnitude apart.

And perhaps the difference is just machines. JuMP’s time on @ccoffrin’s machine is 4.45e-01, and your benchmarks report 2.20e-01.

odow · January 12, 2024, 9:11am

See Add optimization based on SciMLBenchmarks by odow · Pull Request #41 · lanl-ansi/rosetta-opf · GitHub. I ran into ArgumentError: The passed automatic differentiation backend choice is not available. Please load the corresponding AD package SparseReverseDiff..

I also noticed that

it just looks like the case3 has some compilation overhead. The case5 takes 0.4 seconds which is probably the right ballpark.

ChrisRackauckas · January 12, 2024, 10:08am

The real benchmark shows compilation, run time, and mixed.

case	vars	cons	optimization	optimization_modelbuild	optimization_wcompilation	optimization_cost	mtk	mtk_time_modelbuild	mtk_time_wcompilation	mtk_cost	jump	jump_modelbuild	jump_wcompilation	jump_cost	nlpmodels	nlpmodels_modelbuild	nlpmodels_wcompilation	nlpmodels_cost	optim	optim_modelbuild	optim_wcompilation	optim_cost
String	Int64	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
pglib_opf_case3_lmbd.m	24	28	0 .0190105	0.0434296	0.0209169	5812.64	0.0123753	0.392407	0.0142627	743.0	0.00895457	0.00250657	0.00930726	5812.64	0.0185664	0.0154271	0.0194362	5812.64	2.20767	0.000411777	2.05422	6273.63

Notice it reports it all so that there’s no speculation, just timing. With this we can see that even with compilation it’s not close to the reported numbers.

Where is the CI script showing the details of @ccoffrin 's machine to see what it’s doing?

odow · January 12, 2024, 5:06pm

Ignore the case3 numbers and look at the larger sizes. I agree case3 is wrong.

ccoffrin · January 12, 2024, 7:00pm

Here are some initial followups to the latest discussion, which I’ll summarize as three broad topics.

How to reproduce these results?

I have commented on this from time to time in this thread but let me recap more explicitly here. The steps to reproduce these results are,

Checkout the code of the cli branch of the rosetta-opf repo
Download PGLib-OPF data sets
Run an experiment with the command julia --project=<rosetta-opf repo path> rosetta-opf-cli.jl -s <framework id> -f <pglib case path>.

The specific case of running optimization.jl from the rosetta-opf directory would look something like, julia --project=. rosetta-opf-cli.jl -s optimization -f data/pglib_opf_case118_ieee.m

The values reported in this table are the “time_total” parameter that is output by the rosetta-opf scripts via this rosetta-opf-cli.jl interface. This value includes both model building time and solve time.

For the network cases with say less than 1000 buses, this should be sufficient to get similar results. However, for the large cases it becomes important to change Ipopt’s linear solver from the default (MUMPS) to something more performant, like HSL. As noted, in this table I have used HSL ma27. Configuration details for that are discussed here.

What machines are these experiments run on?

This experiment is currently running on pretty dated hardware, but all jobs are run on identical machines, so they are comparable across each other. The specific machine specs are,

A two socket machine with two intel, broadwell E5-2695_v4 CPUs running at 2.1GHz and 125GB of memory.

I think it is reasonable to assume a 2x runtime improvement if one used more modern hardware. These systems are what I have easy access to at this time.

Each cell in this table is run for a maximum wall clock-limit of 2 hours, so the total amount of wall clock time to run the experiment can be quite substantial. I currently use a cluster of these stated computers to collect the data for this table in a reasonable amount of time.

Next steps for updating the Optimization.jl implementation (and other frameworks).

In general, I welcome and encourage modeling layer developers to make PRs to rosetta-opf to improve this example of how to best utilize their modeling system. I would not proclaim to be an expert in any of these modeling layers. Once a PR is merged into the main branch those updates will be reflected in the next iteration of this table.

I can see that @odow has started a PR on updating the optimization.jl implementation. Great! Any help in getting this working would be much appreciated. I can make another round of updates to this table once this is merged.

I’ll be out of the office for the holiday weekend but looking forward to continuing this discussion and getting the latest and greatest implementation of optimization.jl working.

ChrisRackauckas · January 12, 2024, 10:39pm

There are many problems with this statement. Sorry, this is getting really frustrating because we put days of work into building and fixing a reproducible benchmark for this and it is not conducive to a collaborative atmosphere that you keep throwing away the work that we are doing in helping with updating this benchmark. Let me spell it out in full.

Complete Detail of Issues In Your Benchmark Script That Were Already Fixed In Code That We Have Shared With You

We setup a fully automated version of this benchmark:

github.com

SciML/SciMLBenchmarks.jl/blob/master/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd

---
title: Optimal Powerflow Nonlinear Optimization Benchmark
author: Chris Rackauckas
---

## Data Load and Setup Code

This is generic setup code usable for all solver setups. Basically removing some unnecessary untyped dictionaries
before getting to the benchmarks.

```julia
PRINT_LEVEL = 0

# This is a soft upper limit to the time of each optimization.
# If times go above this, they will halt early
MAX_CPU_TIME = 100.0

# Maximum number of variables in an optimization problem for the benchmark
# Anything with more variables is rejected and not run
# This is for testing. 100 is a good size for running a test of changes

This file has been truncated. show original

Anyone who makes a PR to the system has it automatically run on the benchmarking server so that everyone can test their changes on the same computer. The results are then printed for everyone to see. That is what we shared:

https://docs.sciml.ai/SciMLBenchmarksOutput/stable/OptimizationFrameworks/optimal_powerflow/

Many improvements were made to the benchmark. One major thing is that there are validations for each implementation.

github.com

SciML/SciMLBenchmarks.jl/blob/master/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd#L591-L596


      
          ```julia
          @assert optimization_test_res[1] == test_obj
          ```
          
          ```julia
          @assert optimization_test_res[2] == test_cons

These assertions check that on random state trajectories that every implementation receives the same cost value and constraints values. The reason why we added these is because that was not the case for the original benchmarks you had: different implementations evaluated to different constraint values, indicating that some of the constraint functions were not correct. We fixed those in the benchmarks.

Also, the implementations for Optimization.jl, NLPmodels.jl, and Nonconvex.jl were needlessly unoptimized. What I mean is that the implementation has unnecessary allocations by constructing strings as dictionary accessors inside of the cost function (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub). That is a performance issue to any of the models which directly use the cost function definition rather than through a declarative DSL (i.e. everything except JuMP) and is against most Julia style conventions. This was fixed in the updated code and you can see that your benchmarking code does not match what we had sent you (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub, and similar lines for NLPmodels and Nonconvex.jl).

Additionally, the dictionaries that are being pulled from are themselves not type stable since they have differing element types. This is because some of the elements of the dictionaries are themselves dictionaries, while other elements are numbers and others are arrays. Reading data into unstructured dictionaries of course is not standard Julia code anyways and is against standard style conventions, and so this is not recommended. The code that we had sent you fixed this by reading the data into concrete structs (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) which then guard against incorrectness (typos) and ensures type-stability for all of the algorithms.

We also looked into the automatic differentiation methods. Note that the ModelingToolkit method (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub) is known to be a hacky way to construct a symbolic form since it’s using the numerical code in order to specify the problem and then symbolically tracing it rather than simply building the symbolic form. In the fixed version of the benchmark, we split the two pieces. One version of the Optimization.jl benchmark is a purely Julia-code non-symbolic implementation that uses sparse reverse mode automatic differentiation (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) and the other is a purely symbolic implementation of an OptimizationSystem, closer to the representation of JuMP (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub). These are kept separate in the final benchmarks because they are very different implementations with very different results.

In fact, these results are very different in cost because the MTK implementation uses a tearing algorithm to eliminate constraints, and in the end finds solutions that receive a lower cost than any of the other implementations. This is something we have been investigating and I know @Vaibhavdixit02 personally emailed you showing you all of these changes and asking you about this specific result.

Summaries of Issues with Your Benchmark Code

The code that you are running is very type-unstable and is unnecessarily slowing down most of the implementations, and it’s not clear that it’s correctly implementing the constraints for all of them.
The way that you are running the benchmarks is a non-reproducible “it runs on my computer” which we tried to eliminate by setting up an automated benchmarking service.
Shared all of these results both in this thread (AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas) and we have personally sent you emails to try and continue to improve these benchmarks, not just for Optimization.jl but also for NLPmodels, Nonconvex, and JuMP (we haven’t cared about the Optim.jl part, and the CASADI part just segfaults).
It seems that no matter how many times we have tried to contact you or share these results with you, we have been ignored. There does not seem to be an indication that what we have shared was read at all.

Extra Information

So I am very frustrated by statements like:

I think it is reasonable to assume a 2x runtime improvement if one used more modern hardware. These systems are what I have easy access to at this time.

because that is very clearly wrong due to fact that have been repeatedly brought up in this thread and that we have shared with you in emails, and that we have demonstrated in a fully reproducible way (Optimal Powerflow Nonlinear Optimization Benchmark · The SciML Benchmarks). No, the performance difference that you’re seeing is not just due to running on a different machine, it’s very clearly because you are still running the type-unstable versions from prior to September 2023 even after all of the messages that we have sent about this.

The reason why I pointed out case3 was because it’s the smallest case and thus it was the best case to indicate that something was clearly off with the benchmarks. The larger the system, the more time is spent in the linear algebra routine and thus the smaller the overall difference. To really highlight the difference due to implementation of the loss function rather than implementation of the optimization algorithm, the smallest case is the one that would be most dominated by objective function call cost rather than the cost of the AD algorithm or the linear solver choices. Therefore, this result was the biggest red flag in the list, not the one to ignore.

It being 2 orders of magnitude different was a very clear indicator that the code that you were running was not type-stable and had to be doing something egregiously unoptimized in the loss function, which is why I pointed that out. The differences in the large end are then a mix of that lack of code optimization mixed with known issues with the scaling of sparse reverse-mode AD via the current sparse reverse approach. This latter fact was documented earlier as point (4) in the earlier post AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas.

The fixed version of the code separates this and isolates down to simply be about the implementation details, which highlights what the next steps in the AD community needs to be along with symbolic code generation.

Holding out an olive branch

I say all of this because I just want us to be absolutely clear as to the current state of everything. We have been repeatedly trying to share this information with you since September 2023 (as everyone can see in the post above) and even emailed you personally sense. You posted a “new” version of the benchmark that ignored all of the issues we had pointed out, but not only that, ignored all of the fixes that we spent the time to apply, along with the infrustructure that we provided to make it easier to maintain these benchmarks going forward.

Mistakes are made, it happens. But it seems we do have your attention now.

Can we please work together on this?
Can our updates please not be ignored?
Can we please do this by updating the code on the SciMLBenchmarks jmd file so that both of us are running on the same computer?
Is there any update to the code you have done? We would be happy to help upstream that into the jmd file.

This is frustrating and takes more time out of both of us. Let’s work together and use this to improve all of the packages involved.

ccoffrin · January 24, 2024, 1:25am

Based on a discussion with @ChrisRackauckas, rosseta-opf has been updated with,

A second Optimization.jl implementation based on SciMLBenchmarks
Addition of CI for testing the correctness of the rosetta-opf implementations
Clarifications to the rosseta-opf README about the adaptation into the SciMLBenchmarks

A general consensus has been reached that rosseta-opf and SciMLBenchmarks are both reasonable implementations of an AC-OPF benchmark that have different objectives. In particular rosseta-opf focuses on simulating a common non-expert user experience in using NLP tooling without optimized code or data structures; this generally biases the performance results in favor of symbolic front-ends. On the other hand, SciMLBenchmarks uses type-stable high-performance data structures to focus on core implementation differences of the numerical backends. Therefore, it makes sense for each benchmark effort to continue on with these distinct objectives.

With these changes made, below is an updated results table.

I have removed the optim implementation in this iteration to make room for comparing the original Optimization.jl implementation to the SciMLBenchmarks implementation side-by-side. It was also requested to note that the results presented here, which I call “total runtime”, include a combination of (instance-specific compile time) + (model build time) + (solve time).

In this results table (and all pervious ones) there is an instance-specific compilation overhead in the Optimization.jl implementations of 3-5 seconds. This overhead is omitted in the case5_pjm line because this dataset has been used to warmup Julia’s JIT before collecting runtime data. This artifact will be removed in future versions of this table.

At @ChrisRackauckas direction we are expecting that the original Optimization.jl implementation will remain the primary one in rosetta-opf and evolve in time into an example of using ModelingToolkit/JuliaSimCompiler, which are designed for the rosseta-opf use case more-so than Optimization.jl.

Case	Vars	Cons	JuMP	NLPModels	NonConvex	Optimization	Optimization-SciMLBM
case3_lmbd	24	28	1.38e-02	3.90e-02	8.41e+00	2.52e+00	4.79e+00
case5_pjm	44	53	1.95e-02	7.61e-02	1.74e+00	3.97e-01	2.29e-01
case14_ieee	118	169	3.20e-01	3.13e-01	7.87e+01	7.75e+00	7.04e+00
case24_ieee_rts	266	315	3.60e-01	7.49e-01	2.26e+02	1.73e+01	1.17e+01
case30_ieee	236	348	3.73e-01	7.53e-01	2.42e+02	1.72e+01	1.23e+01
case30_as	236	348	3.31e-01	6.40e-01	2.47e+02	1.82e+01	1.16e+01
case39_epri	282	401	1.05e-01	5.94e-01	3.31e+02	2.17e+01	1.11e+01
case57_ieee	448	675	3.93e-01	1.06e+00	5.82e+02	4.33e+01	3.25e+01
case60_c	518	737	5.02e-01	1.48e+00	5.07e+02	4.61e+01	4.71e+01
case73_ieee_rts	824	987	5.08e-01	2.35e+00	1.27e+03	9.04e+01	8.60e+01
case89_pegase	1042	1649	6.60e-01	5.93e+00	3.71e+03	2.14e+02	2.96e+02
case118_ieee	1088	1539	6.39e-01	4.21e+00	2.91e+03	1.85e+02	2.10e+02
case162_ieee_dtc	1484	2313	6.43e-01	6.30e+00	N.D.	4.00e+02	6.01e+02
case179_goc	1468	2200	7.35e-01	8.77e+00	5.80e+03	2.96e+02	6.32e+02
case197_snem	1608	2397	7.04e-01	6.45e+00	N.D.	4.24e+02	5.81e+02
case200_activ	1456	2116	6.46e-01	5.00e+00	6.32e+03	3.12e+02	4.34e+02
case240_pserc	2558	3617	2.78e+00	4.78e+01	N.D.	9.57e+02	4.27e+03
case300_ieee	2382	3478	1.01e+00	1.20e+01	N.D.	8.46e+02	1.70e+03
case500_goc	4254	6097	1.53e+00	2.91e+01	N.D.	3.31e+03	N.D.
case588_sdet	4110	5979	1.37e+00	2.21e+01	N.D.	2.75e+03	N.D.
case793_goc	5432	7978	1.82e+00	3.74e+01	N.D.	4.88e+03	N.D.
case1354_pegase	11192	16646	5.14e+00	1.62e+02	N.D.	N.D.	N.D.
case1803_snem	15246	23172	8.36e+00	3.31e+02	N.D.	N.D.	N.D.
case1888_rte	14480	21494	2.06e+01	5.03e+02	N.D.	N.D.	N.D.
case1951_rte	15018	22075	9.85e+00	3.31e+02	N.D.	N.D.	N.D.
case2000_goc	19008	29432	7.87e+00	3.86e+02	N.D.	N.D.	N.D.
case2312_goc	17128	25716	7.26e+00	2.92e+02	N.D.	N.D.	N.D.
case2383wp_k	17004	25039	7.56e+00	2.65e+02	N.D.	N.D.	N.D.
case2736sp_k	19088	28356	7.14e+00	3.16e+02	N.D.	N.D.	N.D.
case2737sop_k	18988	28358	6.55e+00	2.75e+02	N.D.	N.D.	N.D.
case2742_goc	24540	38196	2.74e+01	1.09e+03	N.D.	N.D.	N.D.
case2746wp_k	19520	28446	6.67e+00	2.94e+02	N.D.	N.D.	N.D.
case2746wop_k	19582	28642	6.73e+00	3.01e+02	N.D.	N.D.	N.D.
case2848_rte	21822	32129	1.49e+01	N.D.	N.D.	N.D.	N.D.
case2853_sdet	23028	33154	1.03e+01	7.33e+02	N.D.	N.D.	N.D.
case2868_rte	22090	32393	1.64e+01	6.84e+02	N.D.	N.D.	N.D.
case2869_pegase	25086	37813	1.25e+01	7.86e+02	N.D.	N.D.	N.D.
case3012wp_k	21082	31029	1.04e+01	4.11e+02	N.D.	N.D.	N.D.
case3022_goc	23238	34990	1.14e+01	N.D.	N.D.	N.D.	N.D.
case3120sp_k	21608	32092	9.91e+00	4.21e+02	N.D.	N.D.	N.D.
case3375wp_k	24350	35876	1.21e+01	6.06e+02	N.D.	N.D.	N.D.
case3970_goc	35270	54428	1.66e+01	1.90e+03	N.D.	N.D.	N.D.
case4020_goc	36696	56957	2.20e+01	2.07e+03	N.D.	N.D.	N.D.
case4601_goc	38814	59596	2.45e+01	2.40e+03	N.D.	N.D.	N.D.
case4619_goc	42532	66289	2.15e+01	2.48e+03	N.D.	N.D.	N.D.
case4661_sdet	34758	51302	1.73e+01	N.D.	N.D.	N.D.	N.D.
case4837_goc	41398	64030	2.20e+01	2.19e+03	N.D.	N.D.	N.D.
case4917_goc	37872	56917	2.02e+01	2.10e+03	N.D.	N.D.	N.D.
case5658_epigrids	48552	74821	2.44e+01	3.38e+03	N.D.	N.D.	N.D.
case6468_rte	49734	75937	6.30e+01	4.18e+03	N.D.	N.D.	N.D.
case6470_rte	50482	75976	3.75e+01	3.38e+03	N.D.	N.D.	N.D.
case6495_rte	50426	76124	6.47e+01	4.43e+03	N.D.	N.D.	N.D.
case6515_rte	50546	76290	5.28e+01	3.98e+03	N.D.	N.D.	N.D.
case7336_epigrids	62116	95306	3.00e+01	5.38e+03	N.D.	N.D.	N.D.
case8387_pegase	78748	118702	4.93e+01	N.D.	N.D.	N.D.	N.D.
case9241_pegase	85568	130826	5.36e+01	N.D.	N.D.	N.D.	N.D.
case9591_goc	83572	130588	6.05e+01	N.D.	N.D.	N.D.	N.D.
case10000_goc	76804	112352	4.94e+01	N.D.	N.D.	N.D.	N.D.
case10192_epigrids	89850	139456	6.06e+01	N.D.	N.D.	N.D.	N.D.
case10480_goc	96750	150874	6.89e+01	N.D.	N.D.	N.D.	N.D.
case13659_pegase	117370	170588	6.63e+01	N.D.	N.D.	N.D.	N.D.
case19402_goc	179562	281733	1.48e+02	N.D.	N.D.	N.D.	N.D.
case20758_epigrids	179236	274918	9.99e+01	N.D.	N.D.	N.D.	N.D.
case24464_goc	203374	313641	1.27e+02	N.D.	N.D.	N.D.	N.D.
case30000_goc	208624	307752	2.43e+02	N.D.	N.D.	N.D.	N.D.
case78484_epigrids	674562	1039062	9.07e+02	N.D.	N.D.	N.D.	N.D.

Julia v1.10.0

Package versions (defaults)
[54578032] ADNLPModels v0.7.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.0
[4076af6c] JuMP v1.18.1
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.7.8
⌃ [7f7a1694] Optimization v3.21.0
⌃ [fd9f6733] OptimizationMOI v0.3.0
[c36e90e8] PowerModels v0.20.1
⌃ [0c5d862f] Symbolics v5.15.1

Note: Package updates to the Optimization dependencies yields a method error.

Package versions (Optimization-SciMLBM)
[2569d6c7] ConcreteStructs v0.2.3
[b6b21f68] Ipopt v1.6.0
[7f7a1694] Optimization v3.21.1
[fd9f6733] OptimizationMOI v0.3.4
[c36e90e8] PowerModels v0.20.1
[37e2e3b7] ReverseDiff v1.15.1
[47a9eef4] SparseDiffTools v2.15.0

Ipopt was configured to run with the linear solver HSL ma27.

ChrisRackauckas · January 24, 2024, 11:43am

Thanks @ccoffrin and @odow for diving deep with me into my concerns!

I think this is the first time that I’ve seen where we’ve split a single benchmark into two very different implementations that will continue to be maintained, but in this case it seems like it is just required. Interesting, but helpful case and something we may want to consider more down the line.

The fully elaborate on what @ccoffrin is saying, the rosseta-opf is really focused on a “naive real user” so to speak, where someone just takes a model definition from a file and slaps that into a loss function. What I was setting up with the SciMLBenchmarks is something that is a comparative solver benchmark, really trying to focus on how the different algorithms are performing and scaling in order to prioritize issues to solve in order to improve general performance.

Understanding the Goals of the Two Benchmarks

At face value, it might look like those are the same thing, but there’s some nuances. One major nuance is the data read. In the rosetta-opf benchmarks, the data is read into dictionaries and those dictionaries are used to construct the loss function. In the SciMLBenchmarks, those dictionaries are instead made into a type-inferred DataRepresentation template so that it can be used in a way that is type-stable.

The key here is that this difference only changes the timing on a subset of the methods, and when it does, it changes it in a very nonlinear way. It does not change the timing of JuMP or direct ModelingToolkit because symbolic interfaces do not directly use the cost function that the user writes down: they are declarative and use that definition to get a mathematical description and build their own type-stable function which embeds the numerical data gathered through tracing as literal constants in the constructed function. Meanwhile, the numerical front-ends (Optimization.jl, NLPmodels.jl, Nonconvex.jl, Optim.jl) all directly use the function that is defined, which has global non-constant dictionaries that are {Any,Any}, and thus any decent Julia programmer can look at that and see “wow, the performance is going to be pretty bad there!”.

What the SciMLBenchmarks versions thus did was take that representation to a type-inferred DataRepresentation to put the numerical front-ends on a more equal footing. But, it also ran all of the benchmarks in a single session. This means that the code for the DataRepresentation, which is a large datatype with lots of type parameters to infer, is compiled once during the validation phase and then is its compilation is not seen in the benchmark phase.

Figuring out the Differences in the Details

The question then becomes, what are you trying to measure? While the Dict{Any,Any} is going to pop out in any benchmark of Optimization.jl, it is a completely valid thing to want to benchmark the experience that such a user would get from a symbolic frontend vs a purely numerical frontend. It truly is the case that JuMP, ModelingToolkit, and all other symbolic interfaces are serving the user better in this kind of scenario because it’s ignoring the exact function the user provided and builds its own better function first, and that means that the user does not have to worry too much about the exact numerical performance details. This is a very valid concern, and this is what the rosetta-opf benchmark then chooses to measure: what if you have this user and they are trying to solve hard problems, what do they see on single runs?

However, that is similarly not as useful as a comparison between solvers if you are trying to match solver details. What I mean by that is, if what you’re trying to benchmark is how JuMP vs Optimization.jl are solving differently, then you want to make the cost and constraint function definitions as equal as possible to isolate any effect of the function building, and then run the same solver setups and see where you’re measuring differences. This is what the SciMLBenchmarks is thus geared towards by forcing the type stable representation so the two have effectively the same performance on the objective function, and any performance difference is thus attributed to differences in the solver setup (will elaborate on) since it’s using the same solver (IPOPT) in the two cases. This is thus helpful for the further development of the algorithms.

Now you might then think that you can take the SciMLBenchmarks version and stick it into rosetta-opf, but as you see in Optimization-SciMLBM, it’s effectively the worst of both worlds. The reason is that if you are doing individual runs, the reading into DataRepresentation requires inference to work a lot harder, and thus it takes about 3-5 seconds for the data reader to compile. SciMLBenchmarks again does not have this factor because all runs are done in the same session, so a specialized data reader is factored out. In theory this can be eliminated by changing PowerModels.jl to using a type template in its read, i.e. always generating the DataRepresentation, and precompiling a lot of functions with respect to that. But in practice, this is not any better for JuMP users, or any symbolic front-end for that matter, and so many users won’t do this. In fact, it might be even strictly worse for JuMP users because it would spend more time compiling and inferring objects that it then ignores since the function used to construct the cost does not need to be fast given that it’s then transformed symbolically. So what you see in the Optimization-SciMLBM column is it effectively spending most of its type compiling that DataRepresentation each time, which given the goals of the rosetta-opf is matching what the user would see.

What’s next for the Benchmarks?

So where do we go from here? This is where the discussion was to simply split the two along the lines of the two different goals trying to be achieved. The SciML crew are building out more to a pure ModelingToolkit symbolic frontend to JuMP, and we’ll be happy to see a benchmark really focused on symbolic frontends and so we’ll contribute that over to the rosetta-opf. It will be interesting to see at what point ModelingToolkit outperforms the naive Optimization.jl, and then the differences from JuMP. Meanwhile, we want to keep the SciMLBenchmarks really focused on the Optimization.jl vs JuMP numerical differences.

And what have we learned about the packages?

I think this is the most important piece. The current main contributors are very easy to identify:

In the Optimization-SciMLBM it’s the compilation of the DataRepresentation.
In the Optimization.jl on its small side, it gets very close to JuMP in terms of performance but there seems to be a performance difference in the tape-compiled ReverseDiff.jl vs the specialized reverse-mode in JuMP. This difference I think gets alleviated with some Enzyme improvements, and so hopefully in the near future on the SciMLBenchmarks we’ll see the Optimization.jl vs JuMP on the smaller problems be pretty close to 1-1 when that is cleared up.
On the larger side, the scaling of Optimization.jl, NLPModels, and Nonconvex is all pretty much dominated by the size of sparsity detection. The detection of sparsity from Julia code all uses the Symbolics.jl version which has worse performance than we’d like right now, allocating a bit too much and becoming the dominant factor. Using sparsity is such a huge improvement in these benchmarks that turning it off definitely hurts the scaling, but at a certain point in these benchmarks that auto-sparsity itself is 99% of the time. @odow got us a good benchmark case which highlights the hessian_sparsity performance Add OptimizationFrameworks/clnlbeam.jmd by odow · Pull Request #836 · SciML/SciMLBenchmarks.jl · GitHub and we’ll dive into that. There’s probably also some differences in the ReverseDiff tape-compiled vs Enzyme here too, but that won’t matter until this is handled.

So sans compile time and type-stability of the function, the main issues for getting other frameworks up to JuMP really come down to the fact that JuMP has its own specialized AD and that does really well on these functions + sparsity, while the general AD functionality is catching up but needs a better sparsity detection algorithm in order to really get there. We’ll keep banging away at that, and that can be tracked in the SciMLBenchmarks, but it won’t help the rosetta-opf benchmarks all that much because other than making the scaling improve, you’ll still see a large difference due to the type-instability. So on rosetta-opf we won’t expect to see any improvements from the Optimization.jl improvements until that part has a ModelingToolkit.jl frontend to get the type-stable function builds like JuMP.

Again @ccoffrin @odow thanks for taking the time to deep dive this.

tl;dr the numbers are very different, but they mean something completely different as well and we’ll just make sure to adequately document what rosetta-opf and SciMLBenchmarks are doing differently.

ccoffrin · February 4, 2024, 6:36pm

Thanks to recent contributions from @sshin23, a new rosetta-opf model has been added based on the ExaModels framework. As the detailed results below show, this implementation tends to be the fastest Julia AC-OPF implementation I have seen so far, specially at the largest scales. Amazing work @sshin23!

This table also includes updates to the NLPModels to use the ConcreteStructs that were proposed in the SciML Benchmarks and an improvement to the Julia JIT warmup procedure and other small maintenance updates.

Case	Vars	Cons	ExaModels	JuMP	NLPModels	NonConvex	Optimization
case3_lmbd	24	28	1.01e-01	1.34e-02	2.96e-02	7.82e+00	2.47e+00
case5_pjm	44	53	5.27e-02	6.56e-02	1.46e-01	1.78e+01	3.47e+00
case14_ieee	118	169	4.17e-02	5.57e-02	1.64e-01	7.69e+01	7.29e+00
case24_ieee_rts	266	315	6.72e-02	1.23e-01	5.72e-01	2.37e+02	1.84e+01
case30_ieee	236	348	4.66e-02	7.86e-02	4.76e-01	2.51e+02	1.75e+01
case30_as	236	348	6.00e-02	8.46e-02	4.98e-01	2.50e+02	1.65e+01
case39_epri	282	401	4.18e-02	1.12e-01	3.63e-01	3.34e+02	2.18e+01
case57_ieee	448	675	5.38e-02	1.03e-01	7.58e-01	5.52e+02	4.27e+01
case60_c	518	737	5.30e-02	1.33e-01	9.63e-01	5.23e+02	4.62e+01
case73_ieee_rts	824	987	9.99e-02	2.10e-01	1.70e+00	1.23e+03	8.95e+01
case89_pegase	1042	1649	1.21e-01	3.57e-01	3.84e+00	3.93e+03	2.19e+02
case118_ieee	1088	1539	1.17e-01	2.90e-01	2.63e+00	3.05e+03	1.81e+02
case162_ieee_dtc	1484	2313	1.53e-01	3.50e-01	4.10e+00	N.D.	3.84e+02
case179_goc	1468	2200	1.76e-01	4.69e-01	4.66e+00	5.77e+03	3.02e+02
case197_snem	1608	2397	1.25e-01	3.45e-01	4.20e+00	N.D.	4.17e+02
case200_activ	1456	2116	1.01e-01	2.78e-01	3.56e+00	6.41e+03	3.10e+02
case240_pserc	2558	3617	9.14e-01	2.78e+00	1.81e+01	N.D.	8.96e+02
case300_ieee	2382	3478	2.09e-01	5.85e-01	8.40e+00	N.D.	8.31e+02
case500_goc	4254	6097	4.28e-01	1.15e+00	1.82e+01	N.D.	3.07e+03
case588_sdet	4110	5979	3.49e-01	9.98e-01	1.47e+01	N.D.	2.55e+03
case793_goc	5432	7978	4.80e-01	1.43e+00	2.68e+01	N.D.	5.23e+03
case1354_pegase	11192	16646	1.23e+00	4.70e+00	1.08e+02	N.D.	N.D.
case1803_snem	15246	23172	2.11e+00	8.15e+00	2.02e+02	N.D.	N.D.
case1888_rte	14480	21494	4.44e+00	2.01e+01	2.94e+02	N.D.	N.D.
case1951_rte	15018	22075	2.44e+00	9.74e+00	1.99e+02	N.D.	N.D.
case2000_goc	19008	29432	2.05e+00	7.80e+00	3.26e+02	N.D.	N.D.
case2312_goc	17128	25716	1.83e+00	7.02e+00	2.37e+02	N.D.	N.D.
case2383wp_k	17004	25039	2.34e+00	8.40e+00	1.90e+02	N.D.	N.D.
case2736sp_k	19088	28356	1.83e+00	6.73e+00	2.60e+02	N.D.	N.D.
case2737sop_k	18988	28358	1.64e+00	5.85e+00	2.56e+02	N.D.	N.D.
case2742_goc	24540	38196	7.32e+00	2.74e+01	6.98e+02	N.D.	N.D.
case2746wp_k	19520	28446	1.88e+00	6.66e+00	2.47e+02	N.D.	N.D.
case2746wop_k	19582	28642	1.75e+00	6.38e+00	2.72e+02	N.D.	N.D.
case2848_rte	21822	32129	3.73e+00	1.86e+01	4.40e+02	N.D.	N.D.
case2853_sdet	23028	33154	2.68e+00	1.05e+01	5.28e+02	N.D.	N.D.
case2868_rte	22090	32393	4.23e+00	1.71e+01	4.76e+02	N.D.	N.D.
case2869_pegase	25086	37813	3.42e+00	1.33e+01	6.70e+02	N.D.	N.D.
case3012wp_k	21082	31029	2.67e+00	1.09e+01	3.44e+02	N.D.	N.D.
case3022_goc	23238	34990	2.82e+00	1.16e+01	5.54e+02	N.D.	N.D.
case3120sp_k	21608	32092	2.71e+00	1.05e+01	3.63e+02	N.D.	N.D.
case3375wp_k	24350	35876	3.23e+00	1.19e+01	5.15e+02	N.D.	N.D.
case3970_goc	35270	54428	6.51e+00	1.74e+01	1.63e+03	N.D.	N.D.
case4020_goc	36696	56957	8.87e+00	2.17e+01	1.68e+03	N.D.	N.D.
case4601_goc	38814	59596	7.64e+00	2.60e+01	1.96e+03	N.D.	N.D.
case4619_goc	42532	66289	7.91e+00	2.16e+01	2.22e+03	N.D.	N.D.
case4661_sdet	34758	51302	5.65e+00	1.76e+01	1.22e+03	N.D.	N.D.
case4837_goc	41398	64030	6.59e+00	2.26e+01	2.09e+03	N.D.	N.D.
case4917_goc	37872	56917	5.77e+00	2.01e+01	1.76e+03	N.D.	N.D.
case5658_epigrids	48552	74821	7.92e+00	2.46e+01	3.18e+03	N.D.	N.D.
case6468_rte	49734	75937	1.64e+01	6.33e+01	3.49e+03	N.D.	N.D.
case6470_rte	50482	75976	1.07e+01	3.61e+01	3.36e+03	N.D.	N.D.
case6495_rte	50426	76124	1.88e+01	6.27e+01	3.59e+03	N.D.	N.D.
case6515_rte	50546	76290	1.50e+01	5.32e+01	3.39e+03	N.D.	N.D.
case7336_epigrids	62116	95306	9.79e+00	3.00e+01	5.09e+03	N.D.	N.D.
case8387_pegase	78748	118702	1.56e+01	5.03e+01	N.D.	N.D.	N.D.
case9241_pegase	85568	130826	1.78e+01	5.34e+01	N.D.	N.D.	N.D.
case9591_goc	83572	130588	2.61e+01	6.06e+01	N.D.	N.D.	N.D.
case10000_goc	76804	112352	1.63e+01	5.00e+01	6.85e+03	N.D.	N.D.
case10192_epigrids	89850	139456	2.09e+01	5.90e+01	N.D.	N.D.	N.D.
case10480_goc	96750	150874	2.85e+01	6.84e+01	N.D.	N.D.	N.D.
case13659_pegase	117370	170588	2.25e+01	6.80e+01	N.D.	N.D.	N.D.
case19402_goc	179562	281733	7.36e+01	1.49e+02	N.D.	N.D.	N.D.
case20758_epigrids	179236	274918	3.71e+01	1.01e+02	N.D.	N.D.	N.D.
case24464_goc	203374	313641	5.00e+01	1.31e+02	N.D.	N.D.	N.D.
case30000_goc	208624	307752	9.94e+01	2.55e+02	N.D.	N.D.	N.D.
case78484_epigrids	674562	1039062	3.69e+02	7.86e+02	N.D.	N.D.	N.D.

Julia v1.10.0

[54578032] ADNLPModels v0.7.0
[2569d6c7] ConcreteStructs v0.2.3
[1037b233] ExaModels v0.5.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.1
[4076af6c] JuMP v1.19.0
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.9.1
[7f7a1694] Optimization v3.22.0
[fd9f6733] OptimizationMOI v0.3.4
[c36e90e8] PowerModels v0.21.0
[0c5d862f] Symbolics v5.16.1

Ipopt was configured to run with the linear solver HSL ma27.

sshin23 · February 4, 2024, 9:58pm

Thanks, @ccoffrin for an amazing project and for sharing detailed results.

Here’s a quick summary on how ExaModels works.

What is ExaModels?

ExaModels.jl is an algebraic modeling system embedded in Julia Language, which is tailored for what we call “SIMD abstraction”. By itself, ExaModels doesn’t have optimization solver capability. This means that you need to interface it with an optimization solver (e.g., Ipopt) to solve problems. ExaModels provide a user front-end to specify the nonlinear model and an automatic differentiation capability to evaluate first/second order derivatives in sparse forms. So, among the total solution time reported in this benchmark, which roughly consists of (1) data parsing time, (2) model creation time, (3) solver time, which can be again broken down to (3-1) derivative evaluation time, (3-2) linear solver time, and (3-3) solver internal time, ExaModels is responsible for (2) and (3-1). ExaModels is pretty efficient both in terms of model creation and derivative evaluation.

So, what is “SIMD abstraction?”

SIMD abstraction is the key idea behind the high performance of ExaModels for OPFs. When specifying the model, ExaModels always require the user to specify the model using iterators. Let’s take an example:

Let’s say that our constraint takes the following form:

function cons!(x,y)
    for i=1:length(x)-2
        y[i] =  3x[i+1]^3+2*x[i+2]-5+sin(x[i+1]-x[i+2])sin(x[i+1]+x[i+2])+4x[i+1]-x[i]exp(x[i]-x[i+1])-3
    end
end

Instead of taking the constraint information in this way,

constraint(model, cons!)

we take the constraint information as

con(x,i) = 3x[i+1]^3+2*x[i+2]-5+sin(x[i+1]-x[i+2])sin(x[i+1]+x[i+2])+4x[i+1]-x[i]exp(x[i]-x[i+1])-3
    
constraint(model, con(x,i) for i=1:length(x)-2)

This allows the model to process the data in the form of an iterator, which allows us to capture the repetitive patterns in the model. This repetitive pattern can be exploited when applying automatic differentiation (AD). In particular, instead of applying AD to the whole cons!, whose internal structure is obscure, we can apply AD to con and run for loop on top of it.

What does ExaModels do more to make AD faster?

An additional trick used in ExaModels is to assume that con has a simple structure, which is the case for OPFs, and when doing the AD, we create expression trees with concrete types. AD on this concrete expression tree can fully take advantage of julia’s JIT compilation, and highly optimized callback can be put together–zero type inference and memory allocations. The even nicer thing here is that we can make these callbacks as GPU kernels using KernelAbstractions.jl, and AD can be performed on different GPU accelerators. Some GPU results can be found in the ExaModels repository.

Limitations

Of course, it is important to note that this improved performance comes from assuming more structures and exploiting them. Clearly, ExaModels serves a narrower class of problems, compared to other modeling systems compared in this benchmark. The key requirement for ExaModels to shine is that the problem has (1) repeated pattern in objective/constraints (meaning that it can be specified as iterators) and (2) each pattern is relatively simple (meaning that it is only dependent on a handful of variables). Many large-scale problems in mathematical programming are like this; e.g., optimal power flow, PDE-constrained optimization, optimal control and parameter estimation problems formulated w/ direct transcription methods. But problems with more complex function structures, e.g., those embedding neural nets, DAE solvers, etc., are not suitable for modeling in this way.

Takeaways and suggestions

One takeaway here is that the more you exploit the structure, the better performance you get. So, we might want to specify how much of the structure is exploited in each modeling approach. Polar form AC OPF problems have several distinct structures, which include:

Objective functions and constraints are smooth nonlinear functions, whose 1st and 2nd order derivatives can be cheaply evaluated.
Constraint Jacobian and Lagrangian Hessian are extremely sparse, and their sparsity pattern is known.
The algebraic structure of the objective and constraint functions are simple and highly repetitive.

Thus, we might want to classify the compared frameworks based on the following criteria, so that one can see if we are doing apples to apples comparison.

Does the modeling system exploit second-order derivatives?
Does the modeling system exploit sparsity?
Does the modeling system exploit repeated patterns?

Also, to reduce the signal-to-noise ratio of the results, I’d suggest

Exclude data parsing time, as it doesn’t have much to do with the capabilities of optimization frameworks.
Separately report model creation time and solution time. Also, if possible, separately report derivative evaluation time, linear solver time, and solver internals.

A bit more detailed description of each configuration might be helpful—this may not be captured by ]pkg st. For example,

ExaModels, JuMP, ADNLPModels, Nonconvex, Optimization are configured with Ipopt and ma27.
For Optimization.jl and ADNLPModels.jl (I’d suggest renaming NLPModels as ADNLPModels, as NLPModels itself is simply a template and doesn’t have a modeling capability), I’d also suggest specifying the AD backends, as it seems that they can be configured with different AD backends.

Topic		Replies	Views
Yearly comparison of Optimization Libraries Optimization (Mathematical)	6	758	May 23, 2024
Extracting the Power Flow equations from PowerModels Optimization (Mathematical) optimization , power-flow	4	504	December 7, 2023
Ipopt solves large-scale nonlinear problems Optimization (Mathematical) nlp	18	1058	July 7, 2023
Juniper...Solver gets only local optimum point Optimization (Mathematical)	10	844	July 26, 2020
Survey of Non-Linear Optimization Modeling Layers in Julia Optimization (Mathematical) nonlinear	63	4255	February 4, 2024

AC Optimal Power Flow in Various Nonlinear Optimization Frameworks

Complete Detail of Issues In Your Benchmark Script That Were Already Fixed In Code That We Have Shared With You

Summaries of Issues with Your Benchmark Code

Extra Information

Holding out an olive branch

Understanding the Goals of the Two Benchmarks

Figuring out the Differences in the Details

What’s next for the Benchmarks?

And what have we learned about the packages?

What is ExaModels?

So, what is “SIMD abstraction?”

What does ExaModels do more to make AD faster?

Limitations

Takeaways and suggestions

Related topics