Completely anecdotally I just ported a training loop at work from Python/PyTorch to Julia. Well, ported in the weakest sense. It’s still PyCalling into PyTorch and data loading, i.e. all the heavy operations. Still it magically became 12% faster.
No, it doesn’t make a whole lot of sense. I suspect that it somewhere in the interaction between Julia and Python has to make a copy of some array, which turns out to improve matters down the line.
(The actual plan is to port the data loading, which is custom and fairly complex, to Julia. There I’m expecting real gains.)