There are many problems with this statement. Sorry, this is getting really frustrating because we put days of work into building and fixing a reproducible benchmark for this and it is not conducive to a collaborative atmosphere that you keep throwing away the work that we are doing in helping with updating this benchmark. Let me spell it out in full.

## Complete Detail of Issues In Your Benchmark Script That Were Already Fixed In Code That We Have Shared With You

We setup a fully automated version of this benchmark:

Anyone who makes a PR to the system has it automatically run on the benchmarking server so that everyone can test their changes on the same computer. The results are then printed for everyone to see. That is what we shared:

https://docs.sciml.ai/SciMLBenchmarksOutput/stable/OptimizationFrameworks/optimal_powerflow/

Many improvements were made to the benchmark. One major thing is that there are validations for each implementation.

These assertions check that on random state trajectories that every implementation receives the same cost value and constraints values. The reason why we added these is because that was not the case for the original benchmarks you had: different implementations evaluated to different constraint values, indicating that some of the constraint functions were not correct. We fixed those in the benchmarks.

Also, the implementations for Optimization.jl, NLPmodels.jl, and Nonconvex.jl were needlessly unoptimized. What I mean is that the implementation has unnecessary allocations by constructing strings as dictionary accessors inside of the cost function (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub). That is a performance issue to any of the models which directly use the cost function definition rather than through a declarative DSL (i.e. everything except JuMP) and is against most Julia style conventions. This was fixed in the updated code and you can see that your benchmarking code does not match what we had sent you (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub, and similar lines for NLPmodels and Nonconvex.jl).

Additionally, the dictionaries that are being pulled from are themselves not type stable since they have differing element types. This is because some of the elements of the dictionaries are themselves dictionaries, while other elements are numbers and others are arrays. Reading data into unstructured dictionaries of course is not standard Julia code anyways and is against standard style conventions, and so this is not recommended. The code that we had sent you fixed this by reading the data into concrete structs (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) which then guard against incorrectness (typos) and ensures type-stability for all of the algorithms.

We also looked into the automatic differentiation methods. Note that the ModelingToolkit method (rosetta-opf/optimization.jl at cli · lanl-ansi/rosetta-opf · GitHub) is known to be a hacky way to construct a symbolic form since it’s using the numerical code in order to specify the problem and then symbolically tracing it rather than simply building the symbolic form. In the fixed version of the benchmark, we split the two pieces. One version of the Optimization.jl benchmark is a purely Julia-code non-symbolic implementation that uses sparse reverse mode automatic differentiation (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub) and the other is a purely symbolic implementation of an OptimizationSystem, closer to the representation of JuMP (SciMLBenchmarks.jl/benchmarks/OptimizationFrameworks/optimal_powerflow.jmd at master · SciML/SciMLBenchmarks.jl · GitHub). These are kept separate in the final benchmarks because they are very different implementations with very different results.

In fact, these results are very different in cost because the MTK implementation uses a tearing algorithm to eliminate constraints, and in the end finds solutions that receive a lower cost than any of the other implementations. This is something we have been investigating and I know @Vaibhavdixit02 personally emailed you showing you all of these changes and asking you about this specific result.

## Summaries of Issues with Your Benchmark Code

- The code that you are running is very type-unstable and is unnecessarily slowing down most of the implementations, and it’s not clear that it’s correctly implementing the constraints for all of them.
- The way that you are running the benchmarks is a non-reproducible “it runs on my computer” which we tried to eliminate by setting up an automated benchmarking service.
- Shared all of these results both in this thread (AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas) and we have personally sent you emails to try and continue to improve these benchmarks, not just for Optimization.jl but also for NLPmodels, Nonconvex, and JuMP (we haven’t cared about the Optim.jl part, and the CASADI part just segfaults).
- It seems that no matter how many times we have tried to contact you or share these results with you, we have been ignored. There does not seem to be an indication that what we have shared was read at all.

## Extra Information

So I am very frustrated by statements like:

I think it is reasonable to assume a 2x runtime improvement if one used more modern hardware. These systems are what I have easy access to at this time.

because that is very clearly wrong due to fact that have been repeatedly brought up in this thread and that we have shared with you in emails, and that we have demonstrated in a fully reproducible way (Optimal Powerflow Nonlinear Optimization Benchmark · The SciML Benchmarks). No, the performance difference that you’re seeing is not just due to running on a different machine, it’s very clearly because you are still running the type-unstable versions from prior to September 2023 even after all of the messages that we have sent about this.

The reason why I pointed out case3 was because it’s the smallest case and thus it was the best case to indicate that something was clearly off with the benchmarks. The larger the system, the more time is spent in the linear algebra routine and thus the smaller the overall difference. To really highlight the difference due to implementation of the loss function rather than implementation of the optimization algorithm, the smallest case is the one that would be most dominated by objective function call cost rather than the cost of the AD algorithm or the linear solver choices. Therefore, this result was the biggest red flag in the list, not the one to ignore.

It being 2 orders of magnitude different was a very clear indicator that the code that you were running was not type-stable and had to be doing something egregiously unoptimized in the loss function, which is why I pointed that out. The differences in the large end are then a mix of that lack of code optimization mixed with known issues with the scaling of sparse reverse-mode AD via the current sparse reverse approach. This latter fact was documented earlier as point (4) in the earlier post AC Optimal Power Flow in Various Nonlinear Optimization Frameworks - #67 by ChrisRackauckas.

The fixed version of the code separates this and isolates down to simply be about the implementation details, which highlights what the next steps in the AD community needs to be along with symbolic code generation.

## Holding out an olive branch

I say all of this because I just want us to be absolutely clear as to the current state of everything. We have been repeatedly trying to share this information with you since September 2023 (as everyone can see in the post above) and even emailed you personally sense. You posted a “new” version of the benchmark that ignored all of the issues we had pointed out, but not only that, ignored all of the fixes that we spent the time to apply, along with the infrustructure that we provided to make it easier to maintain these benchmarks going forward.

Mistakes are made, it happens. But it seems we do have your attention now.

- Can we please work together on this?
- Can our updates please not be ignored?
- Can we please do this by updating the code on the SciMLBenchmarks jmd file so that both of us are running on the same computer?
- Is there any update to the code you have done? We would be happy to help upstream that into the jmd file.

This is frustrating and takes more time out of both of us. Let’s work together and use this to improve all of the packages involved.