AC Optimal Power Flow in Various Nonlinear Optimization Frameworks

When it comes to AC-OPF faster is always better and more useful but <1h is interesting and useful for “day-ahead” operations; <10 minutes is required for things like “real-time” operations.

Current state of the art for this case, as far as I know, is OPOMO at 303 seconds.

I have a PR to get it onto the SciMLBenchmarks system so that way it can be auto-run on a standard machine through CI, making it easier for everyone to get involved:

I noticed Optimization.jl is using ForwardDiff.jl and no sparsity. These days, it has Enzyme with sparse Hessians. I presume that would make a major difference, and so after this merges @Vaibhavdixit02 should look into adding that.

From memory we tried the various sparsity and AD options and ran into memory issues. This was the fastest, but perhaps recent versions are better. :+1: for putting this in the benchmark suite so you have control over what options are used etc.

Avik’s new and improved sparse interface which makes sparse reverse mode simpler only merged a week ago New High Level Interface for Sparse Jacobian Computation by avik-pal · Pull Request #253 · JuliaDiff/SparseDiffTools.jl · GitHub so I presume it wasn’t the latest stuff. Not all of this has made it to Optimization.jl’s high level AD system yet, but by making this be the benchmark we can do a benchmark-driven development to ensure what’s merged is efficient. I hate building in a vacuum.

1 Like

Update: the benchmark is now running on the SciMLBenchmarks! You can find it here:

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/OptimizationFrameworks/optimal_powerflow/

Anyone can trigger the benchmark to run just by changing the code here:

It makes a cute table at the end with the current results:

Things to note:

  1. I know the table formatting can be improved. If anyone knows how to make documenter show the whole thing I’d be happy to have the help in formatting it.
  2. There is a parameter SIZE_LIMIT which tells the benchmark the maximum number of variables to use in the benchmark files. This is set to 100 right now just to play around with formatting, but I’ll bump it up and do a bigger run. I did do a bigger run a bit back so I know it works, just :person_shrugging: when fixing table formatting it’s annoying to wait a day haha.
  3. Optimization.jl code was horrendously allocating and type-unstable. A common issue in the code was using d["$i"] on a dictionary, instead of just using an integer. The dictionaries are still there, so there’s still some indirection that should be removed and the performance should increase, but at least it’s type-stable and non-allocating now. I’ll actually optimize the code in the near future, but the code now is at least a lot simpler than the original code and won’t interfere with AD tooling.
  4. Optimization.jl is now using a sparse reverse mode, but it’s not optimized yet. This case is a case we’re using to optimize things. See:

which are low hanging fruit identified from this.

  1. I added a lot of validation cases so if you change the code around, assertions will throw failures if you break something.
  2. I couldn’t get the CASADI code to run, I’d like some help figuring that out.
2 Likes

This looks like a problem that FastDifferentiation.jl can solve.

It handles sparsity well and is likely to be faster than the alternatives for this type of problem. If you decide to try it I’ll be happy to help you get it working.

@brianguenter we have it setup to MTK trace already and it produces a branch-free code, so once the Symbolics integration is done this was going to be one of the first cases I wanted to test.

Given the release of a new nonlinear modeling interface in JuMP v1.15, I thought it would be good to update rosseta-opf to use that new interface and check these results.

The JuMP results are largely unchanged with an arguable case for a small performance improvements at the largest scales. No notable performance regression is a good outcome for the new nonlinear modeling interface, which has been rebuilt from scratch for improved maintainability and added support for a variety of new features.

In updating to more recent versions of all related packages and Julia, Optimization.jl also saw a notable improvement, now solving up to 793 bus case for the first time. Nonconvex.jl also solved a 200 bus case for the first time. NLPModels encountered some new exceptions in derivative calculations on large cases.

By and large, there has been notable progress across the board!

Case Vars Cons JuMP NLPModels NonConvex Optim Optimization
case3_lmbd 24 28 1.47e-02 4.82e-02 8.42e+00 INF. 2.62e+00
case5_pjm 44 53 1.78e-02 8.63e-02 1.76e+00 INF. 3.96e-01
case14_ieee 118 169 3.45e-01 3.18e-01 7.84e+01 N.D. 7.75e+00
case24_ieee_rts 266 315 3.69e-01 7.81e-01 2.38e+02 INF. 1.77e+01
case30_ieee 236 348 3.74e-01 7.84e-01 2.41e+02 N.D. 1.76e+01
case30_as 236 348 3.33e-01 6.68e-01 2.49e+02 INF. 1.77e+01
case39_epri 282 401 9.70e-02 6.81e-01 3.28e+02 N.D. 2.18e+01
case57_ieee 448 675 4.01e-01 1.15e+00 5.82e+02 N.D. 4.24e+01
case60_c 518 737 4.14e-01 1.61e+00 5.29e+02 N.D. 4.52e+01
case73_ieee_rts 824 987 4.45e-01 2.69e+00 1.22e+03 N.D. 9.43e+01
case89_pegase 1042 1649 6.33e-01 6.70e+00 3.96e+03 N.D. 2.23e+02
case118_ieee 1088 1539 6.35e-01 4.44e+00 3.05e+03 N.D. 1.79e+02
case162_ieee_dtc 1484 2313 6.39e-01 6.21e+00 N.D. N.D. 3.95e+02
case179_goc 1468 2200 7.24e-01 9.49e+00 5.67e+03 N.D. 3.01e+02
case197_snem 1608 2397 6.84e-01 6.76e+00 N.D. N.D. 4.29e+02
case200_activ 1456 2116 5.41e-01 5.29e+00 6.49e+03 N.D. 2.92e+02
case240_pserc 2558 3617 2.76e+00 4.90e+01 N.D. N.D. 9.65e+02
case300_ieee 2382 3478 9.57e-01 1.23e+01 N.D. N.D. 8.37e+02
case500_goc 4254 6097 1.40e+00 2.98e+01 N.D. N.D. 3.11e+03
case588_sdet 4110 5979 1.34e+00 2.31e+01 N.D. N.D. 2.58e+03
case793_goc 5432 7978 1.69e+00 4.00e+01 N.D. N.D. 4.94e+03
case1354_pegase 11192 16646 4.89e+00 1.68e+02 N.D. N.D. N.D.
case1803_snem 15246 23172 8.23e+00 3.37e+02 N.D. N.D. N.D.
case1888_rte 14480 21494 1.98e+01 5.18e+02 N.D. N.D. N.D.
case1951_rte 15018 22075 9.86e+00 N.D. N.D. N.D. N.D.
case2000_goc 19008 29432 7.82e+00 3.93e+02 N.D. N.D. N.D.
case2312_goc 17128 25716 7.27e+00 2.92e+02 N.D. N.D. N.D.
case2383wp_k 17004 25039 7.84e+00 2.68e+02 N.D. N.D. N.D.
case2736sp_k 19088 28356 7.03e+00 3.15e+02 N.D. N.D. N.D.
case2737sop_k 18988 28358 6.40e+00 N.D. N.D. N.D. N.D.
case2742_goc 24540 38196 2.77e+01 1.16e+03 N.D. N.D. N.D.
case2746wp_k 19520 28446 6.81e+00 2.90e+02 N.D. N.D. N.D.
case2746wop_k 19582 28642 6.55e+00 2.94e+02 N.D. N.D. N.D.
case2848_rte 21822 32129 1.50e+01 6.11e+02 N.D. N.D. N.D.
case2853_sdet 23028 33154 1.08e+01 7.18e+02 N.D. N.D. N.D.
case2868_rte 22090 32393 1.66e+01 6.69e+02 N.D. N.D. N.D.
case2869_pegase 25086 37813 1.25e+01 8.14e+02 N.D. N.D. N.D.
case3012wp_k 21082 31029 9.96e+00 4.17e+02 N.D. N.D. N.D.
case3022_goc 23238 34990 1.16e+01 N.D. N.D. N.D. N.D.
case3120sp_k 21608 32092 1.01e+01 4.35e+02 N.D. N.D. N.D.
case3375wp_k 24350 35876 1.17e+01 N.D. N.D. N.D. N.D.
case3970_goc 35270 54428 1.67e+01 2.02e+03 N.D. N.D. N.D.
case4020_goc 36696 56957 2.22e+01 N.D. N.D. N.D. N.D.
case4601_goc 38814 59596 2.36e+01 2.38e+03 N.D. N.D. N.D.
case4619_goc 42532 66289 2.10e+01 2.44e+03 N.D. N.D. N.D.
case4661_sdet 34758 51302 1.72e+01 1.50e+03 N.D. N.D. N.D.
case4837_goc 41398 64030 2.25e+01 2.25e+03 N.D. N.D. N.D.
case4917_goc 37872 56917 2.04e+01 N.D. N.D. N.D. N.D.
case5658_epigrids 48552 74821 2.40e+01 3.37e+03 N.D. N.D. N.D.
case6468_rte 49734 75937 6.32e+01 4.10e+03 N.D. N.D. N.D.
case6470_rte 50482 75976 3.64e+01 3.56e+03 N.D. N.D. N.D.
case6495_rte 50426 76124 6.38e+01 4.39e+03 N.D. N.D. N.D.
case6515_rte 50546 76290 5.21e+01 4.19e+03 N.D. N.D. N.D.
case7336_epigrids 62116 95306 2.90e+01 5.42e+03 N.D. N.D. N.D.
case8387_pegase 78748 118702 5.03e+01 N.D. N.D. N.D. N.D.
case9241_pegase 85568 130826 5.30e+01 N.D. N.D. N.D. N.D.
case9591_goc 83572 130588 6.02e+01 N.D. N.D. N.D. N.D.
case10000_goc 76804 112352 5.11e+01 N.D. N.D. N.D. N.D.
case10192_epigrids 89850 139456 5.81e+01 N.D. N.D. N.D. N.D.
case10480_goc 96750 150874 6.87e+01 N.D. N.D. N.D. N.D.
case13659_pegase 117370 170588 6.82e+01 N.D. N.D. N.D. N.D.
case19402_goc 179562 281733 1.49e+02 N.D. N.D. N.D. N.D.
case20758_epigrids 179236 274918 1.04e+02 N.D. N.D. N.D. N.D.
case24464_goc 203374 313641 1.27e+02 N.D. N.D. N.D. N.D.
case30000_goc 208624 307752 2.49e+02 N.D. N.D. N.D. N.D.
case78484_epigrids 674562 1039062 7.77e+02 N.D. N.D. N.D. N.D.

Package Details

Julia v1.10.0
[54578032] ADNLPModels v0.7.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.0
[4076af6c] JuMP v1.18.1
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.7.8
[7f7a1694] Optimization v3.21.0
[fd9f6733] OptimizationMOI v0.3.0
[c36e90e8] PowerModels v0.20.1
[0c5d862f] Symbolics v5.15.1

Ipopt was run with the linear solver HSL ma27.

5 Likes

Your “results” make no sense. I can see from the reproducible continuous benchmarking machine that this differs by orders of magnitude against the code that is considered canonical here.

https://docs.sciml.ai/SciMLBenchmarksOutput/dev/OptimizationFrameworks/optimal_powerflow/

case vars cons optimization optimization_modelbuild optimization_wcompilation optimization_cost mtk mtk_time_modelbuild mtk_time_wcompilation mtk_cost jump jump_modelbuild jump_wcompilation jump_cost nlpmodels nlpmodels_modelbuild nlpmodels_wcompilation nlpmodels_cost optim optim_modelbuild optim_wcompilation optim_cost
String Int64 Int64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
pglib_opf_case3_lmbd.m 24 28 0 .0190105 0.0434296 0.0209169 5812.64 0.0123753 0.392407 0.0142627 743.0 0.00895457 0.00250657 0.00930726 5812.64 0.0185664 0.0154271 0.0194362 5812.64 2.20767 0.000411777 2.05422 6273.63

Given that you’re two orders of magnitude away from the continuous reproducible version of the benchmarks, I don’t know if the numbers are just made up or wrong but it is clearly doesn’t pass the smell test.

Let me take a look. @ccoffrin has been using rosetta-opf/optimization.jl at main · lanl-ansi/rosetta-opf · GitHub. It doesn’t include the changes made in the benchmark repo yet.

FWIW, in the case73_ieee_rts, @ccoffrin report 9.43e+01, and and Optimal Powerflow Nonlinear Optimization Benchmark · The SciML Benchmarks reports 6.44e+01 (optimization_modelbuild + optimization), so your implementation is much better, but they’re not two orders of magnitude apart.

And perhaps the difference is just machines. JuMP’s time on @ccoffrin’s machine is 4.45e-01, and your benchmarks report 2.20e-01.

See Add optimization based on SciMLBenchmarks by odow · Pull Request #41 · lanl-ansi/rosetta-opf · GitHub. I ran into ArgumentError: The passed automatic differentiation backend choice is not available. Please load the corresponding AD package SparseReverseDiff..

I also noticed that
image

it just looks like the case3 has some compilation overhead. The case5 takes 0.4 seconds which is probably the right ballpark.

The real benchmark shows compilation, run time, and mixed.

case vars cons optimization optimization_modelbuild optimization_wcompilation optimization_cost mtk mtk_time_modelbuild mtk_time_wcompilation mtk_cost jump jump_modelbuild jump_wcompilation jump_cost nlpmodels nlpmodels_modelbuild nlpmodels_wcompilation nlpmodels_cost optim optim_modelbuild optim_wcompilation optim_cost
String Int64 Int64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64 Float64
pglib_opf_case3_lmbd.m 24 28 0 .0190105 0.0434296 0.0209169 5812.64 0.0123753 0.392407 0.0142627 743.0 0.00895457 0.00250657 0.00930726 5812.64 0.0185664 0.0154271 0.0194362 5812.64 2.20767 0.000411777 2.05422 6273.63

Notice it reports it all so that there’s no speculation, just timing. With this we can see that even with compilation it’s not close to the reported numbers.

Where is the CI script showing the details of @ccoffrin 's machine to see what it’s doing?

Ignore the case3 numbers and look at the larger sizes. I agree case3 is wrong.

Here are some initial followups to the latest discussion, which I’ll summarize as three broad topics.

How to reproduce these results?

I have commented on this from time to time in this thread but let me recap more explicitly here. The steps to reproduce these results are,

  1. Checkout the code of the cli branch of the rosetta-opf repo
  2. Download PGLib-OPF data sets
  3. Run an experiment with the command julia --project=<rosetta-opf repo path> rosetta-opf-cli.jl -s <framework id> -f <pglib case path>.

The specific case of running optimization.jl from the rosetta-opf directory would look something like, julia --project=. rosetta-opf-cli.jl -s optimization -f data/pglib_opf_case118_ieee.m

The values reported in this table are the “time_total” parameter that is output by the rosetta-opf scripts via this rosetta-opf-cli.jl interface. This value includes both model building time and solve time.

For the network cases with say less than 1000 buses, this should be sufficient to get similar results. However, for the large cases it becomes important to change Ipopt’s linear solver from the default (MUMPS) to something more performant, like HSL. As noted, in this table I have used HSL ma27. Configuration details for that are discussed here.

What machines are these experiments run on?

This experiment is currently running on pretty dated hardware, but all jobs are run on identical machines, so they are comparable across each other. The specific machine specs are,

A two socket machine with two intel, broadwell E5-2695_v4 CPUs running at 2.1GHz and 125GB of memory.

I think it is reasonable to assume a 2x runtime improvement if one used more modern hardware. These systems are what I have easy access to at this time.

Each cell in this table is run for a maximum wall clock-limit of 2 hours, so the total amount of wall clock time to run the experiment can be quite substantial. I currently use a cluster of these stated computers to collect the data for this table in a reasonable amount of time.

Next steps for updating the Optimization.jl implementation (and other frameworks).

In general, I welcome and encourage modeling layer developers to make PRs to rosetta-opf to improve this example of how to best utilize their modeling system. I would not proclaim to be an expert in any of these modeling layers. Once a PR is merged into the main branch those updates will be reflected in the next iteration of this table.

I can see that @odow has started a PR on updating the optimization.jl implementation. Great! Any help in getting this working would be much appreciated. I can make another round of updates to this table once this is merged.

I’ll be out of the office for the holiday weekend but looking forward to continuing this discussion and getting the latest and greatest implementation of optimization.jl working.

2 Likes

There are many problems with this statement. Sorry, this is getting really frustrating because we put days of work into building and fixing a reproducible benchmark for this and it is not conducive to a collaborative atmosphere that you keep throwing away the work that we are doing in helping with updating this benchmark. Let me spell it out in full.

Complete Detail of Issues In Your Benchmark Script That Were Already Fixed In Code That We Have Shared With You

We setup a fully automated version of this benchmark:

Anyone who makes a PR to the system has it automatically run on the benchmarking server so that everyone can test their changes on the same computer. The results are then printed for everyone to see. That is what we shared:

https://docs.sciml.ai/SciMLBenchmarksOutput/stable/OptimizationFrameworks/optimal_powerflow/

Many improvements were made to the benchmark. One major thing is that there are validations for each implementation.

These assertions check that on random state trajectories that every implementation receives the same cost value and constraints values. The reason why we added these is because that was not the case for the original benchmarks you had: different implementations evaluated to different constraint values, indicating that some of the constraint functions were not correct. We fixed those in the benchmarks.

Also, the implementations for Optimization.jl, NLPmodels.jl, and Nonconvex.jl were needlessly unoptimized. What I mean is that the implementation has unnecessary allocations by constructing strings as dictionary accessors inside of the cost function (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub). That is a performance issue to any of the models which directly use the cost function definition rather than through a declarative DSL (i.e. everything except JuMP) and is against most Julia style conventions. This was fixed in the updated code and you can see that your benchmarking code does not match what we had sent you (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub, and similar lines for NLPmodels and Nonconvex.jl).

Additionally, the dictionaries that are being pulled from are themselves not type stable since they have differing element types. This is because some of the elements of the dictionaries are themselves dictionaries, while other elements are numbers and others are arrays. Reading data into unstructured dictionaries of course is not standard Julia code anyways and is against standard style conventions, and so this is not recommended. The code that we had sent you fixed this by reading the data into concrete structs (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) which then guard against incorrectness (typos) and ensures type-stability for all of the algorithms.

We also looked into the automatic differentiation methods. Note that the ModelingToolkit method (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub) is known to be a hacky way to construct a symbolic form since it’s using the numerical code in order to specify the problem and then symbolically tracing it rather than simply building the symbolic form. In the fixed version of the benchmark, we split the two pieces. One version of the Optimization.jl benchmark is a purely Julia-code non-symbolic implementation that uses sparse reverse mode automatic differentiation (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) and the other is a purely symbolic implementation of an OptimizationSystem, closer to the representation of JuMP (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub). These are kept separate in the final benchmarks because they are very different implementations with very different results.

In fact, these results are very different in cost because the MTK implementation uses a tearing algorithm to eliminate constraints, and in the end finds solutions that receive a lower cost than any of the other implementations. This is something we have been investigating and I know @Vaibhavdixit02 personally emailed you showing you all of these changes and asking you about this specific result.

Summaries of Issues with Your Benchmark Code

  1. The code that you are running is very type-unstable and is unnecessarily slowing down most of the implementations, and it’s not clear that it’s correctly implementing the constraints for all of them.
  2. The way that you are running the benchmarks is a non-reproducible “it runs on my computer” which we tried to eliminate by setting up an automated benchmarking service.
  3. Shared all of these results both in this thread (AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas) and we have personally sent you emails to try and continue to improve these benchmarks, not just for Optimization.jl but also for NLPmodels, Nonconvex, and JuMP (we haven’t cared about the Optim.jl part, and the CASADI part just segfaults).
  4. It seems that no matter how many times we have tried to contact you or share these results with you, we have been ignored. There does not seem to be an indication that what we have shared was read at all.

Extra Information

So I am very frustrated by statements like:

I think it is reasonable to assume a 2x runtime improvement if one used more modern hardware. These systems are what I have easy access to at this time.

because that is very clearly wrong due to fact that have been repeatedly brought up in this thread and that we have shared with you in emails, and that we have demonstrated in a fully reproducible way (Optimal Powerflow Nonlinear Optimization Benchmark · The SciML Benchmarks). No, the performance difference that you’re seeing is not just due to running on a different machine, it’s very clearly because you are still running the type-unstable versions from prior to September 2023 even after all of the messages that we have sent about this.

The reason why I pointed out case3 was because it’s the smallest case and thus it was the best case to indicate that something was clearly off with the benchmarks. The larger the system, the more time is spent in the linear algebra routine and thus the smaller the overall difference. To really highlight the difference due to implementation of the loss function rather than implementation of the optimization algorithm, the smallest case is the one that would be most dominated by objective function call cost rather than the cost of the AD algorithm or the linear solver choices. Therefore, this result was the biggest red flag in the list, not the one to ignore.

It being 2 orders of magnitude different was a very clear indicator that the code that you were running was not type-stable and had to be doing something egregiously unoptimized in the loss function, which is why I pointed that out. The differences in the large end are then a mix of that lack of code optimization mixed with known issues with the scaling of sparse reverse-mode AD via the current sparse reverse approach. This latter fact was documented earlier as point (4) in the earlier post AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas.

The fixed version of the code separates this and isolates down to simply be about the implementation details, which highlights what the next steps in the AD community needs to be along with symbolic code generation.

Holding out an olive branch

I say all of this because I just want us to be absolutely clear as to the current state of everything. We have been repeatedly trying to share this information with you since September 2023 (as everyone can see in the post above) and even emailed you personally sense. You posted a “new” version of the benchmark that ignored all of the issues we had pointed out, but not only that, ignored all of the fixes that we spent the time to apply, along with the infrustructure that we provided to make it easier to maintain these benchmarks going forward.

Mistakes are made, it happens. But it seems we do have your attention now.

  • Can we please work together on this?
  • Can our updates please not be ignored?
  • Can we please do this by updating the code on the SciMLBenchmarks jmd file so that both of us are running on the same computer?
  • Is there any update to the code you have done? We would be happy to help upstream that into the jmd file.

This is frustrating and takes more time out of both of us. Let’s work together and use this to improve all of the packages involved.

1 Like

Based on a discussion with @ChrisRackauckas, rosseta-opf has been updated with,

A general consensus has been reached that rosseta-opf and SciMLBenchmarks are both reasonable implementations of an AC-OPF benchmark that have different objectives. In particular rosseta-opf focuses on simulating a common non-expert user experience in using NLP tooling without optimized code or data structures; this generally biases the performance results in favor of symbolic front-ends. On the other hand, SciMLBenchmarks uses type-stable high-performance data structures to focus on core implementation differences of the numerical backends. Therefore, it makes sense for each benchmark effort to continue on with these distinct objectives.

With these changes made, below is an updated results table.

I have removed the optim implementation in this iteration to make room for comparing the original Optimization.jl implementation to the SciMLBenchmarks implementation side-by-side. It was also requested to note that the results presented here, which I call “total runtime”, include a combination of (instance-specific compile time) + (model build time) + (solve time).

In this results table (and all pervious ones) there is an instance-specific compilation overhead in the Optimization.jl implementations of 3-5 seconds. This overhead is omitted in the case5_pjm line because this dataset has been used to warmup Julia’s JIT before collecting runtime data. This artifact will be removed in future versions of this table.

At @ChrisRackauckas direction we are expecting that the original Optimization.jl implementation will remain the primary one in rosetta-opf and evolve in time into an example of using ModelingToolkit/JuliaSimCompiler, which are designed for the rosseta-opf use case more-so than Optimization.jl.

Case Vars Cons JuMP NLPModels NonConvex Optimization Optimization-SciMLBM
case3_lmbd 24 28 1.38e-02 3.90e-02 8.41e+00 2.52e+00 4.79e+00
case5_pjm 44 53 1.95e-02 7.61e-02 1.74e+00 3.97e-01 2.29e-01
case14_ieee 118 169 3.20e-01 3.13e-01 7.87e+01 7.75e+00 7.04e+00
case24_ieee_rts 266 315 3.60e-01 7.49e-01 2.26e+02 1.73e+01 1.17e+01
case30_ieee 236 348 3.73e-01 7.53e-01 2.42e+02 1.72e+01 1.23e+01
case30_as 236 348 3.31e-01 6.40e-01 2.47e+02 1.82e+01 1.16e+01
case39_epri 282 401 1.05e-01 5.94e-01 3.31e+02 2.17e+01 1.11e+01
case57_ieee 448 675 3.93e-01 1.06e+00 5.82e+02 4.33e+01 3.25e+01
case60_c 518 737 5.02e-01 1.48e+00 5.07e+02 4.61e+01 4.71e+01
case73_ieee_rts 824 987 5.08e-01 2.35e+00 1.27e+03 9.04e+01 8.60e+01
case89_pegase 1042 1649 6.60e-01 5.93e+00 3.71e+03 2.14e+02 2.96e+02
case118_ieee 1088 1539 6.39e-01 4.21e+00 2.91e+03 1.85e+02 2.10e+02
case162_ieee_dtc 1484 2313 6.43e-01 6.30e+00 N.D. 4.00e+02 6.01e+02
case179_goc 1468 2200 7.35e-01 8.77e+00 5.80e+03 2.96e+02 6.32e+02
case197_snem 1608 2397 7.04e-01 6.45e+00 N.D. 4.24e+02 5.81e+02
case200_activ 1456 2116 6.46e-01 5.00e+00 6.32e+03 3.12e+02 4.34e+02
case240_pserc 2558 3617 2.78e+00 4.78e+01 N.D. 9.57e+02 4.27e+03
case300_ieee 2382 3478 1.01e+00 1.20e+01 N.D. 8.46e+02 1.70e+03
case500_goc 4254 6097 1.53e+00 2.91e+01 N.D. 3.31e+03 N.D.
case588_sdet 4110 5979 1.37e+00 2.21e+01 N.D. 2.75e+03 N.D.
case793_goc 5432 7978 1.82e+00 3.74e+01 N.D. 4.88e+03 N.D.
case1354_pegase 11192 16646 5.14e+00 1.62e+02 N.D. N.D. N.D.
case1803_snem 15246 23172 8.36e+00 3.31e+02 N.D. N.D. N.D.
case1888_rte 14480 21494 2.06e+01 5.03e+02 N.D. N.D. N.D.
case1951_rte 15018 22075 9.85e+00 3.31e+02 N.D. N.D. N.D.
case2000_goc 19008 29432 7.87e+00 3.86e+02 N.D. N.D. N.D.
case2312_goc 17128 25716 7.26e+00 2.92e+02 N.D. N.D. N.D.
case2383wp_k 17004 25039 7.56e+00 2.65e+02 N.D. N.D. N.D.
case2736sp_k 19088 28356 7.14e+00 3.16e+02 N.D. N.D. N.D.
case2737sop_k 18988 28358 6.55e+00 2.75e+02 N.D. N.D. N.D.
case2742_goc 24540 38196 2.74e+01 1.09e+03 N.D. N.D. N.D.
case2746wp_k 19520 28446 6.67e+00 2.94e+02 N.D. N.D. N.D.
case2746wop_k 19582 28642 6.73e+00 3.01e+02 N.D. N.D. N.D.
case2848_rte 21822 32129 1.49e+01 N.D. N.D. N.D. N.D.
case2853_sdet 23028 33154 1.03e+01 7.33e+02 N.D. N.D. N.D.
case2868_rte 22090 32393 1.64e+01 6.84e+02 N.D. N.D. N.D.
case2869_pegase 25086 37813 1.25e+01 7.86e+02 N.D. N.D. N.D.
case3012wp_k 21082 31029 1.04e+01 4.11e+02 N.D. N.D. N.D.
case3022_goc 23238 34990 1.14e+01 N.D. N.D. N.D. N.D.
case3120sp_k 21608 32092 9.91e+00 4.21e+02 N.D. N.D. N.D.
case3375wp_k 24350 35876 1.21e+01 6.06e+02 N.D. N.D. N.D.
case3970_goc 35270 54428 1.66e+01 1.90e+03 N.D. N.D. N.D.
case4020_goc 36696 56957 2.20e+01 2.07e+03 N.D. N.D. N.D.
case4601_goc 38814 59596 2.45e+01 2.40e+03 N.D. N.D. N.D.
case4619_goc 42532 66289 2.15e+01 2.48e+03 N.D. N.D. N.D.
case4661_sdet 34758 51302 1.73e+01 N.D. N.D. N.D. N.D.
case4837_goc 41398 64030 2.20e+01 2.19e+03 N.D. N.D. N.D.
case4917_goc 37872 56917 2.02e+01 2.10e+03 N.D. N.D. N.D.
case5658_epigrids 48552 74821 2.44e+01 3.38e+03 N.D. N.D. N.D.
case6468_rte 49734 75937 6.30e+01 4.18e+03 N.D. N.D. N.D.
case6470_rte 50482 75976 3.75e+01 3.38e+03 N.D. N.D. N.D.
case6495_rte 50426 76124 6.47e+01 4.43e+03 N.D. N.D. N.D.
case6515_rte 50546 76290 5.28e+01 3.98e+03 N.D. N.D. N.D.
case7336_epigrids 62116 95306 3.00e+01 5.38e+03 N.D. N.D. N.D.
case8387_pegase 78748 118702 4.93e+01 N.D. N.D. N.D. N.D.
case9241_pegase 85568 130826 5.36e+01 N.D. N.D. N.D. N.D.
case9591_goc 83572 130588 6.05e+01 N.D. N.D. N.D. N.D.
case10000_goc 76804 112352 4.94e+01 N.D. N.D. N.D. N.D.
case10192_epigrids 89850 139456 6.06e+01 N.D. N.D. N.D. N.D.
case10480_goc 96750 150874 6.89e+01 N.D. N.D. N.D. N.D.
case13659_pegase 117370 170588 6.63e+01 N.D. N.D. N.D. N.D.
case19402_goc 179562 281733 1.48e+02 N.D. N.D. N.D. N.D.
case20758_epigrids 179236 274918 9.99e+01 N.D. N.D. N.D. N.D.
case24464_goc 203374 313641 1.27e+02 N.D. N.D. N.D. N.D.
case30000_goc 208624 307752 2.43e+02 N.D. N.D. N.D. N.D.
case78484_epigrids 674562 1039062 9.07e+02 N.D. N.D. N.D. N.D.

Julia v1.10.0

Package versions (defaults)
[54578032] ADNLPModels v0.7.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.0
[4076af6c] JuMP v1.18.1
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.7.8
⌃ [7f7a1694] Optimization v3.21.0
⌃ [fd9f6733] OptimizationMOI v0.3.0
[c36e90e8] PowerModels v0.20.1
⌃ [0c5d862f] Symbolics v5.15.1

Note: Package updates to the Optimization dependencies yields a method error.

Package versions (Optimization-SciMLBM)
[2569d6c7] ConcreteStructs v0.2.3
[b6b21f68] Ipopt v1.6.0
[7f7a1694] Optimization v3.21.1
[fd9f6733] OptimizationMOI v0.3.4
[c36e90e8] PowerModels v0.20.1
[37e2e3b7] ReverseDiff v1.15.1
[47a9eef4] SparseDiffTools v2.15.0

Ipopt was configured to run with the linear solver HSL ma27.

6 Likes

Thanks @ccoffrin and @odow for diving deep with me into my concerns!

I think this is the first time that I’ve seen where we’ve split a single benchmark into two very different implementations that will continue to be maintained, but in this case it seems like it is just required. Interesting, but helpful case and something we may want to consider more down the line.

The fully elaborate on what @ccoffrin is saying, the rosseta-opf is really focused on a “naive real user” so to speak, where someone just takes a model definition from a file and slaps that into a loss function. What I was setting up with the SciMLBenchmarks is something that is a comparative solver benchmark, really trying to focus on how the different algorithms are performing and scaling in order to prioritize issues to solve in order to improve general performance.

Understanding the Goals of the Two Benchmarks

At face value, it might look like those are the same thing, but there’s some nuances. One major nuance is the data read. In the rosetta-opf benchmarks, the data is read into dictionaries and those dictionaries are used to construct the loss function. In the SciMLBenchmarks, those dictionaries are instead made into a type-inferred DataRepresentation template so that it can be used in a way that is type-stable.

The key here is that this difference only changes the timing on a subset of the methods, and when it does, it changes it in a very nonlinear way. It does not change the timing of JuMP or direct ModelingToolkit because symbolic interfaces do not directly use the cost function that the user writes down: they are declarative and use that definition to get a mathematical description and build their own type-stable function which embeds the numerical data gathered through tracing as literal constants in the constructed function. Meanwhile, the numerical front-ends (Optimization.jl, NLPmodels.jl, Nonconvex.jl, Optim.jl) all directly use the function that is defined, which has global non-constant dictionaries that are {Any,Any}, and thus any decent Julia programmer can look at that and see “wow, the performance is going to be pretty bad there!”.

What the SciMLBenchmarks versions thus did was take that representation to a type-inferred DataRepresentation to put the numerical front-ends on a more equal footing. But, it also ran all of the benchmarks in a single session. This means that the code for the DataRepresentation, which is a large datatype with lots of type parameters to infer, is compiled once during the validation phase and then is its compilation is not seen in the benchmark phase.

Figuring out the Differences in the Details

The question then becomes, what are you trying to measure? While the Dict{Any,Any} is going to pop out in any benchmark of Optimization.jl, it is a completely valid thing to want to benchmark the experience that such a user would get from a symbolic frontend vs a purely numerical frontend. It truly is the case that JuMP, ModelingToolkit, and all other symbolic interfaces are serving the user better in this kind of scenario because it’s ignoring the exact function the user provided and builds its own better function first, and that means that the user does not have to worry too much about the exact numerical performance details. This is a very valid concern, and this is what the rosetta-opf benchmark then chooses to measure: what if you have this user and they are trying to solve hard problems, what do they see on single runs?

However, that is similarly not as useful as a comparison between solvers if you are trying to match solver details. What I mean by that is, if what you’re trying to benchmark is how JuMP vs Optimization.jl are solving differently, then you want to make the cost and constraint function definitions as equal as possible to isolate any effect of the function building, and then run the same solver setups and see where you’re measuring differences. This is what the SciMLBenchmarks is thus geared towards by forcing the type stable representation so the two have effectively the same performance on the objective function, and any performance difference is thus attributed to differences in the solver setup (will elaborate on) since it’s using the same solver (IPOPT) in the two cases. This is thus helpful for the further development of the algorithms.

Now you might then think that you can take the SciMLBenchmarks version and stick it into rosetta-opf, but as you see in Optimization-SciMLBM, it’s effectively the worst of both worlds. The reason is that if you are doing individual runs, the reading into DataRepresentation requires inference to work a lot harder, and thus it takes about 3-5 seconds for the data reader to compile. SciMLBenchmarks again does not have this factor because all runs are done in the same session, so a specialized data reader is factored out. In theory this can be eliminated by changing PowerModels.jl to using a type template in its read, i.e. always generating the DataRepresentation, and precompiling a lot of functions with respect to that. But in practice, this is not any better for JuMP users, or any symbolic front-end for that matter, and so many users won’t do this. In fact, it might be even strictly worse for JuMP users because it would spend more time compiling and inferring objects that it then ignores since the function used to construct the cost does not need to be fast given that it’s then transformed symbolically. So what you see in the Optimization-SciMLBM column is it effectively spending most of its type compiling that DataRepresentation each time, which given the goals of the rosetta-opf is matching what the user would see.

What’s next for the Benchmarks?

So where do we go from here? This is where the discussion was to simply split the two along the lines of the two different goals trying to be achieved. The SciML crew are building out more to a pure ModelingToolkit symbolic frontend to JuMP, and we’ll be happy to see a benchmark really focused on symbolic frontends and so we’ll contribute that over to the rosetta-opf. It will be interesting to see at what point ModelingToolkit outperforms the naive Optimization.jl, and then the differences from JuMP. Meanwhile, we want to keep the SciMLBenchmarks really focused on the Optimization.jl vs JuMP numerical differences.

And what have we learned about the packages?

I think this is the most important piece. The current main contributors are very easy to identify:

  1. In the Optimization-SciMLBM it’s the compilation of the DataRepresentation.
  2. In the Optimization.jl on its small side, it gets very close to JuMP in terms of performance but there seems to be a performance difference in the tape-compiled ReverseDiff.jl vs the specialized reverse-mode in JuMP. This difference I think gets alleviated with some Enzyme improvements, and so hopefully in the near future on the SciMLBenchmarks we’ll see the Optimization.jl vs JuMP on the smaller problems be pretty close to 1-1 when that is cleared up.
  3. On the larger side, the scaling of Optimization.jl, NLPModels, and Nonconvex is all pretty much dominated by the size of sparsity detection. The detection of sparsity from Julia code all uses the Symbolics.jl version which has worse performance than we’d like right now, allocating a bit too much and becoming the dominant factor. Using sparsity is such a huge improvement in these benchmarks that turning it off definitely hurts the scaling, but at a certain point in these benchmarks that auto-sparsity itself is 99% of the time. @odow got us a good benchmark case which highlights the hessian_sparsity performance Add OptimizationFrameworks/clnlbeam.jmd by odow · Pull Request #836 · SciML/SciMLBenchmarks.jl · GitHub and we’ll dive into that. There’s probably also some differences in the ReverseDiff tape-compiled vs Enzyme here too, but that won’t matter until this is handled.

So sans compile time and type-stability of the function, the main issues for getting other frameworks up to JuMP really come down to the fact that JuMP has its own specialized AD and that does really well on these functions + sparsity, while the general AD functionality is catching up but needs a better sparsity detection algorithm in order to really get there. We’ll keep banging away at that, and that can be tracked in the SciMLBenchmarks, but it won’t help the rosetta-opf benchmarks all that much because other than making the scaling improve, you’ll still see a large difference due to the type-instability. So on rosetta-opf we won’t expect to see any improvements from the Optimization.jl improvements until that part has a ModelingToolkit.jl frontend to get the type-stable function builds like JuMP.

Again @ccoffrin @odow thanks for taking the time to deep dive this.

tl;dr the numbers are very different, but they mean something completely different as well and we’ll just make sure to adequately document what rosetta-opf and SciMLBenchmarks are doing differently.

6 Likes

Thanks to recent contributions from @sshin23, a new rosetta-opf model has been added based on the ExaModels framework. As the detailed results below show, this implementation tends to be the fastest Julia AC-OPF implementation I have seen so far, specially at the largest scales. Amazing work @sshin23!

This table also includes updates to the NLPModels to use the ConcreteStructs that were proposed in the SciML Benchmarks and an improvement to the Julia JIT warmup procedure and other small maintenance updates.

Case Vars Cons ExaModels JuMP NLPModels NonConvex Optimization
case3_lmbd 24 28 1.01e-01 1.34e-02 2.96e-02 7.82e+00 2.47e+00
case5_pjm 44 53 5.27e-02 6.56e-02 1.46e-01 1.78e+01 3.47e+00
case14_ieee 118 169 4.17e-02 5.57e-02 1.64e-01 7.69e+01 7.29e+00
case24_ieee_rts 266 315 6.72e-02 1.23e-01 5.72e-01 2.37e+02 1.84e+01
case30_ieee 236 348 4.66e-02 7.86e-02 4.76e-01 2.51e+02 1.75e+01
case30_as 236 348 6.00e-02 8.46e-02 4.98e-01 2.50e+02 1.65e+01
case39_epri 282 401 4.18e-02 1.12e-01 3.63e-01 3.34e+02 2.18e+01
case57_ieee 448 675 5.38e-02 1.03e-01 7.58e-01 5.52e+02 4.27e+01
case60_c 518 737 5.30e-02 1.33e-01 9.63e-01 5.23e+02 4.62e+01
case73_ieee_rts 824 987 9.99e-02 2.10e-01 1.70e+00 1.23e+03 8.95e+01
case89_pegase 1042 1649 1.21e-01 3.57e-01 3.84e+00 3.93e+03 2.19e+02
case118_ieee 1088 1539 1.17e-01 2.90e-01 2.63e+00 3.05e+03 1.81e+02
case162_ieee_dtc 1484 2313 1.53e-01 3.50e-01 4.10e+00 N.D. 3.84e+02
case179_goc 1468 2200 1.76e-01 4.69e-01 4.66e+00 5.77e+03 3.02e+02
case197_snem 1608 2397 1.25e-01 3.45e-01 4.20e+00 N.D. 4.17e+02
case200_activ 1456 2116 1.01e-01 2.78e-01 3.56e+00 6.41e+03 3.10e+02
case240_pserc 2558 3617 9.14e-01 2.78e+00 1.81e+01 N.D. 8.96e+02
case300_ieee 2382 3478 2.09e-01 5.85e-01 8.40e+00 N.D. 8.31e+02
case500_goc 4254 6097 4.28e-01 1.15e+00 1.82e+01 N.D. 3.07e+03
case588_sdet 4110 5979 3.49e-01 9.98e-01 1.47e+01 N.D. 2.55e+03
case793_goc 5432 7978 4.80e-01 1.43e+00 2.68e+01 N.D. 5.23e+03
case1354_pegase 11192 16646 1.23e+00 4.70e+00 1.08e+02 N.D. N.D.
case1803_snem 15246 23172 2.11e+00 8.15e+00 2.02e+02 N.D. N.D.
case1888_rte 14480 21494 4.44e+00 2.01e+01 2.94e+02 N.D. N.D.
case1951_rte 15018 22075 2.44e+00 9.74e+00 1.99e+02 N.D. N.D.
case2000_goc 19008 29432 2.05e+00 7.80e+00 3.26e+02 N.D. N.D.
case2312_goc 17128 25716 1.83e+00 7.02e+00 2.37e+02 N.D. N.D.
case2383wp_k 17004 25039 2.34e+00 8.40e+00 1.90e+02 N.D. N.D.
case2736sp_k 19088 28356 1.83e+00 6.73e+00 2.60e+02 N.D. N.D.
case2737sop_k 18988 28358 1.64e+00 5.85e+00 2.56e+02 N.D. N.D.
case2742_goc 24540 38196 7.32e+00 2.74e+01 6.98e+02 N.D. N.D.
case2746wp_k 19520 28446 1.88e+00 6.66e+00 2.47e+02 N.D. N.D.
case2746wop_k 19582 28642 1.75e+00 6.38e+00 2.72e+02 N.D. N.D.
case2848_rte 21822 32129 3.73e+00 1.86e+01 4.40e+02 N.D. N.D.
case2853_sdet 23028 33154 2.68e+00 1.05e+01 5.28e+02 N.D. N.D.
case2868_rte 22090 32393 4.23e+00 1.71e+01 4.76e+02 N.D. N.D.
case2869_pegase 25086 37813 3.42e+00 1.33e+01 6.70e+02 N.D. N.D.
case3012wp_k 21082 31029 2.67e+00 1.09e+01 3.44e+02 N.D. N.D.
case3022_goc 23238 34990 2.82e+00 1.16e+01 5.54e+02 N.D. N.D.
case3120sp_k 21608 32092 2.71e+00 1.05e+01 3.63e+02 N.D. N.D.
case3375wp_k 24350 35876 3.23e+00 1.19e+01 5.15e+02 N.D. N.D.
case3970_goc 35270 54428 6.51e+00 1.74e+01 1.63e+03 N.D. N.D.
case4020_goc 36696 56957 8.87e+00 2.17e+01 1.68e+03 N.D. N.D.
case4601_goc 38814 59596 7.64e+00 2.60e+01 1.96e+03 N.D. N.D.
case4619_goc 42532 66289 7.91e+00 2.16e+01 2.22e+03 N.D. N.D.
case4661_sdet 34758 51302 5.65e+00 1.76e+01 1.22e+03 N.D. N.D.
case4837_goc 41398 64030 6.59e+00 2.26e+01 2.09e+03 N.D. N.D.
case4917_goc 37872 56917 5.77e+00 2.01e+01 1.76e+03 N.D. N.D.
case5658_epigrids 48552 74821 7.92e+00 2.46e+01 3.18e+03 N.D. N.D.
case6468_rte 49734 75937 1.64e+01 6.33e+01 3.49e+03 N.D. N.D.
case6470_rte 50482 75976 1.07e+01 3.61e+01 3.36e+03 N.D. N.D.
case6495_rte 50426 76124 1.88e+01 6.27e+01 3.59e+03 N.D. N.D.
case6515_rte 50546 76290 1.50e+01 5.32e+01 3.39e+03 N.D. N.D.
case7336_epigrids 62116 95306 9.79e+00 3.00e+01 5.09e+03 N.D. N.D.
case8387_pegase 78748 118702 1.56e+01 5.03e+01 N.D. N.D. N.D.
case9241_pegase 85568 130826 1.78e+01 5.34e+01 N.D. N.D. N.D.
case9591_goc 83572 130588 2.61e+01 6.06e+01 N.D. N.D. N.D.
case10000_goc 76804 112352 1.63e+01 5.00e+01 6.85e+03 N.D. N.D.
case10192_epigrids 89850 139456 2.09e+01 5.90e+01 N.D. N.D. N.D.
case10480_goc 96750 150874 2.85e+01 6.84e+01 N.D. N.D. N.D.
case13659_pegase 117370 170588 2.25e+01 6.80e+01 N.D. N.D. N.D.
case19402_goc 179562 281733 7.36e+01 1.49e+02 N.D. N.D. N.D.
case20758_epigrids 179236 274918 3.71e+01 1.01e+02 N.D. N.D. N.D.
case24464_goc 203374 313641 5.00e+01 1.31e+02 N.D. N.D. N.D.
case30000_goc 208624 307752 9.94e+01 2.55e+02 N.D. N.D. N.D.
case78484_epigrids 674562 1039062 3.69e+02 7.86e+02 N.D. N.D. N.D.

Julia v1.10.0

[54578032] ADNLPModels v0.7.0
[2569d6c7] ConcreteStructs v0.2.3
[1037b233] ExaModels v0.5.0
[f6369f11] ForwardDiff v0.10.36
[b6b21f68] Ipopt v1.6.1
[4076af6c] JuMP v1.19.0
[961ee093] ModelingToolkit v8.75.0
[f4238b75] NLPModelsIpopt v0.10.1
[01bcebdf] Nonconvex v2.1.3
[bf347577] NonconvexIpopt v0.4.2
[429524aa] Optim v1.9.1
[7f7a1694] Optimization v3.22.0
[fd9f6733] OptimizationMOI v0.3.4
[c36e90e8] PowerModels v0.21.0
[0c5d862f] Symbolics v5.16.1

Ipopt was configured to run with the linear solver HSL ma27.

4 Likes

Thanks, @ccoffrin for an amazing project and for sharing detailed results.

Here’s a quick summary on how ExaModels works.

What is ExaModels?

ExaModels.jl is an algebraic modeling system embedded in Julia Language, which is tailored for what we call “SIMD abstraction”. By itself, ExaModels doesn’t have optimization solver capability. This means that you need to interface it with an optimization solver (e.g., Ipopt) to solve problems. ExaModels provide a user front-end to specify the nonlinear model and an automatic differentiation capability to evaluate first/second order derivatives in sparse forms. So, among the total solution time reported in this benchmark, which roughly consists of (1) data parsing time, (2) model creation time, (3) solver time, which can be again broken down to (3-1) derivative evaluation time, (3-2) linear solver time, and (3-3) solver internal time, ExaModels is responsible for (2) and (3-1). ExaModels is pretty efficient both in terms of model creation and derivative evaluation.

So, what is “SIMD abstraction?”

SIMD abstraction is the key idea behind the high performance of ExaModels for OPFs. When specifying the model, ExaModels always require the user to specify the model using iterators. Let’s take an example:

Let’s say that our constraint takes the following form:

function cons!(x,y)
    for i=1:length(x)-2
        y[i] =  3x[i+1]^3+2*x[i+2]-5+sin(x[i+1]-x[i+2])sin(x[i+1]+x[i+2])+4x[i+1]-x[i]exp(x[i]-x[i+1])-3
    end
end

Instead of taking the constraint information in this way,

constraint(model, cons!)

we take the constraint information as

con(x,i) = 3x[i+1]^3+2*x[i+2]-5+sin(x[i+1]-x[i+2])sin(x[i+1]+x[i+2])+4x[i+1]-x[i]exp(x[i]-x[i+1])-3
    
constraint(model, con(x,i) for i=1:length(x)-2)

This allows the model to process the data in the form of an iterator, which allows us to capture the repetitive patterns in the model. This repetitive pattern can be exploited when applying automatic differentiation (AD). In particular, instead of applying AD to the whole cons!, whose internal structure is obscure, we can apply AD to con and run for loop on top of it.

What does ExaModels do more to make AD faster?

An additional trick used in ExaModels is to assume that con has a simple structure, which is the case for OPFs, and when doing the AD, we create expression trees with concrete types. AD on this concrete expression tree can fully take advantage of julia’s JIT compilation, and highly optimized callback can be put together–zero type inference and memory allocations. The even nicer thing here is that we can make these callbacks as GPU kernels using KernelAbstractions.jl, and AD can be performed on different GPU accelerators. Some GPU results can be found in the ExaModels repository.

Limitations

Of course, it is important to note that this improved performance comes from assuming more structures and exploiting them. Clearly, ExaModels serves a narrower class of problems, compared to other modeling systems compared in this benchmark. The key requirement for ExaModels to shine is that the problem has (1) repeated pattern in objective/constraints (meaning that it can be specified as iterators) and (2) each pattern is relatively simple (meaning that it is only dependent on a handful of variables). Many large-scale problems in mathematical programming are like this; e.g., optimal power flow, PDE-constrained optimization, optimal control and parameter estimation problems formulated w/ direct transcription methods. But problems with more complex function structures, e.g., those embedding neural nets, DAE solvers, etc., are not suitable for modeling in this way.

Takeaways and suggestions

One takeaway here is that the more you exploit the structure, the better performance you get. So, we might want to specify how much of the structure is exploited in each modeling approach. Polar form AC OPF problems have several distinct structures, which include:

  • Objective functions and constraints are smooth nonlinear functions, whose 1st and 2nd order derivatives can be cheaply evaluated.
  • Constraint Jacobian and Lagrangian Hessian are extremely sparse, and their sparsity pattern is known.
  • The algebraic structure of the objective and constraint functions are simple and highly repetitive.

Thus, we might want to classify the compared frameworks based on the following criteria, so that one can see if we are doing apples to apples comparison.

  • Does the modeling system exploit second-order derivatives?
  • Does the modeling system exploit sparsity?
  • Does the modeling system exploit repeated patterns?

Also, to reduce the signal-to-noise ratio of the results, I’d suggest

  • Exclude data parsing time, as it doesn’t have much to do with the capabilities of optimization frameworks.
  • Separately report model creation time and solution time. Also, if possible, separately report derivative evaluation time, linear solver time, and solver internals.

A bit more detailed description of each configuration might be helpful—this may not be captured by ]pkg st. For example,

  • ExaModels, JuMP, ADNLPModels, Nonconvex, Optimization are configured with Ipopt and ma27.
  • For Optimization.jl and ADNLPModels.jl (I’d suggest renaming NLPModels as ADNLPModels, as NLPModels itself is simply a template and doesn’t have a modeling capability), I’d also suggest specifying the AD backends, as it seems that they can be configured with different AD backends.
13 Likes