1 is probably easiest. Take the performance hit of non-mutation and go with it. As your problem size increases, if you’re using an implicit method, progressively more time will be spent in the factorization and matrix multiplications (which grows as O(n^3)) and thus the allocations won’t matter after awhile. The hard place of course are the “midsized” problems for which the asymptotic behavior is not a good enough reason to leave off performance tricks. Still, it’s what I’d recommend today.
For 2, you cannot just use ChainRules with Enzyme easily. That’s kind of the whole problem there: when the code gets down to the LLVM level where Enzyme acts, there’s no guarantee that your function calls even exist anymore. Those calls may have been inlined by Julia, and so there would be no way to intercept it in that case. Fancy tricks can probably be used to get a lot of cases (and it has to get fancy even just to support allocations in the first place, or any dynamism), but I wouldn’t expect an average user to hack on that at all. Instead, BLAS support should be coming soon (it’s actively being worked on by some of the Julia Lab students), and that should be most of what’s needed in most cases.