I preface this with saying that I really am not very experienced with both pytorch and juliatorch, and apologize for any misunderstandings that I have.
We are developing a physics simulation ecosystem that can be differentiated using almost any of the AD backends in DifferentiationInterface.jl (our interface is mutating with scalar indexing on the CPU so e.g. Zygote.jl doesn’t really work). Many of our collaborators/prospective users, however, are sticking with Python no matter what, given their current workflow setups, and will not use/learn Julia. No problem; we are ensuring that our Julia ecosystem is easily usable from within Python - much of this has been made super easy thanks to the juliacall package.
One of the most important things for our software is to have easy integration with pytorch. It seems to me that the best way of handling this is following the steps here to plug our Julia functions into the autograd framework of pytorch: Extending PyTorch — PyTorch 2.8 documentation
To my understanding, the juliatorch package does exactly this in a generic way, so that Julia functions can be differentiated through using the autograd “backend” in pytorch. This may be what we end up using in the long term.
However, I imagine there could be a lot of performance gains by computing the gradients directly in Julia, and then passing said gradients to pytorch. Really what I would like is to compute the gradients using DifferentiationInterface.jl. For example, in Julia I can then specify AutoReverseDiff(;compile=true)
and get super fast backwards gradients to then pass to pytorch. I worry that differentiating using pytorch’s tensors directly may not achieve similar performance, especially given our mutating interface.
If there indeed are major performance gains in doing this, then this tooling could find broad use in the wider community.