Excessive allocations in basic SciMLSensitivity example

This whole thread seems to have a lot of misunderstandings of reverse mode AD. So a few things:

  1. In order to reverse, you need to store the forward pass. So Reverse mode pretty much requires allocations (though there are ways to handle this, this needs a bit more work). Unless you do BacksolveAdjoint, which is numerically unstable. For this case you could turn on BacksolveAdjoint, but the tutorial won’t do that because it’s generally unsafe and we generally don’t want to point people towards incorrect / unstable numerical methods.
  2. The loss = sum(x->sum(y->abs2(y-1), x), sol.u) is mostly a Zygote thing due to constructing the matrix for the reverse pass. Keeping everything as smaller arrays just helps Zygote a bit, though we can probably optimize that with a few views that don’t seem to inline.
  3. " I’m seeing a few non-const global variables being accessed by the methods of loss and the (non-const ) callback" these won’t matter because the majority of the time is going to be the adjoint pass
  4. “Does using StaticArrays help? (edit: ok I guess regardless of whether it helps here that has limited utility with more variables)” it can help somewhat, but you still need to allocate the holder to store “indefinitely” for when the reverse pass hits, so unlike forward modes you cannot just stack allocate it
  5. “It looks like SciML intentionally uses a lot or runtime dispatch” this is Zygote that is transforming dispatches to runtime, though most of the time should be in the adjoint pass, which is the function barrier and that should force it to mostly be fine.

With all of that said, I’m still in the middle of diagnosing a regression that happened and is highlighted in Bump autodiff tests by ChrisRackauckas · Pull Request #1275 · SciML/SciMLBenchmarks.jl · GitHub . If you’re interested in diving in I can help point to some places to look, since figuring out the reverse mode profile is… difficult is an oversimplification.

2 Likes