Mooncake, Turing, and CI

Hello,
I finally am switching to DifferentiationInterface for the tests of a package of mine and I don’t think I could be happier! It lets me quickly write tests in a coherent manner, to try different backends. Amazing work!

Now, this finally let me something that has been on my to-do list for a long time: Mooncake.
To my surprise (maybe not to the specialists?), it just worked out of the box for my packages. It also outperforms Zygote (with a single exception, when I wrote some rrules myself based on ChainRules).
For my work, most of what I do is to blend together Neural Networks (mostly, simple MLPs based on Lux), SciML for ODE and integrals, and Turing. I have been using all of these things in a fruitful manner, with several papers of mine, students, and collaborators.

Given the success in employing Mooncake, I am considering to continue supporting Zygote within my codes but to later switch to Mooncake as the primary AD backend for my work. However, there are a couple of caveats.
First, Mooncake is painfully slow when the gradient for a function is computed the first time. This is not a problem when I am running my actual analysis (they last ~ hours, so a few minutes on top of them are not a problem at all), but this painful when I am doing developments and I push on GitHub, as the CI is now significantly slower. Is there any suggestion on how to improve that? This is not super important, but it would be nice if anyone had a suggestion.
Second, I find that the gradient preparation mechanism significantly improves the performance of Mooncake is my use cases. Is there any manner to do a similar thing within a Turing model?

Thanks in advance :slight_smile:

4 Likes

As you can guess, I also have that same issue with DifferentiationInterface’s CI, and I don’t have a great workaround. Here are some tricks I found to be helpful:

  • Don’t run tests on every Julia version for draft PR’s, which allows you to iterate quicker (only run them once the PR is “ready for review”).
  • Use a lower compilation setting when running tests on draft PRs.
  • Split your CI workflow into several jobs (GitHub can run a couple of them in parallel). Of course that means compilation is repeated on each one so it’s a trade-off.

Indeed you never should use Mooncake without preparation.
@penelopeysm would know best for this one, but I thought preparation was baked into Turing’s use of DI?

1 Like

Hi @gdalle , thanks for the prompt response!
Regarding the CI, I will definitely steal something from your workflows :slight_smile:
Regarding Turing, I haven’t tested yet whether the preparation mechanism is backed in or not, but from what you linked this seems to be the case. Just waiting for the answer from the Turing people. If this happens to be the case, maybe it could be the case to add a line to the docs of Turing to highlight this…?