Your analysis is correct about the differences between Julia and other ML ecosystems like Python.
To me, the main practical benefit is that you don’t have to duplicate libraries in every tensor framework. You don’t need a diffrax and a torchdiffeq to suit both PyTorch and JAX: you can have a single DifferentialEquations.jl in which you pour all of the community’s efforts, and then make minor adjustments to ensure compatibility with various autodiff backends.
In this regard, projects like DifferentiationInterface.jl are (in my biased opinion) a key requirement to reach the right level of abstraction and separate concerns between tensors and gradients.
We discuss this a little in our JuliaCon 2024 tutorial on autodiff: