Hello,
I finally am switching to DifferentiationInterface for the tests of a package of mine and I don’t think I could be happier! It lets me quickly write tests in a coherent manner, to try different backends. Amazing work!
Now, this finally let me something that has been on my to-do list for a long time: Mooncake.
To my surprise (maybe not to the specialists?), it just worked out of the box for my packages. It also outperforms Zygote (with a single exception, when I wrote some rrules myself based on ChainRules).
For my work, most of what I do is to blend together Neural Networks (mostly, simple MLPs based on Lux), SciML for ODE and integrals, and Turing. I have been using all of these things in a fruitful manner, with several papers of mine, students, and collaborators.
Given the success in employing Mooncake, I am considering to continue supporting Zygote within my codes but to later switch to Mooncake as the primary AD backend for my work. However, there are a couple of caveats.
First, Mooncake is painfully slow when the gradient for a function is computed the first time. This is not a problem when I am running my actual analysis (they last ~ hours, so a few minutes on top of them are not a problem at all), but this painful when I am doing developments and I push on GitHub, as the CI is now significantly slower. Is there any suggestion on how to improve that? This is not super important, but it would be nice if anyone had a suggestion.
Second, I find that the gradient preparation mechanism significantly improves the performance of Mooncake is my use cases. Is there any manner to do a similar thing within a Turing model?
Thanks in advance ![]()