I’ve read a bunch of the blogs on the Julia website (e.g. on machine learning and programming languages), but I remain unconvinced about what large benefits Julia provides over PyTorch.
For example, one section in the blog post posits Julia as something that allows for the usability of PyTorch without the Python interpreter overhead. While that’s great for inference use-cases, I think the results have shown that researchers don’t care about the negligible Python overhead compared to the overhead of actually writing the model.
Looking at the Flux.jl github page, I don’t see many places where it differentiates itself from PyTorch. Much of the “unusual architectures” section seems to positioning Julia as an alternative to TensorFlow, but I don’t see the benefit compared to PyTorch. The example about the sigmoid is easily handled by any kind of fuser (like with the PyTorch JIT).
Being able to write new operators in Julia doesn’t particularly convince me either. In order to get solid performance you need to go beyond just writing your naive loops in C++/Cuda, and you need a system more like Halide/TVM/PlaidML.
I do see an advantage in writing Julia if you truly need fundamental data structures (trees/maps/w.e.) embedded in the core of your machine learning system. However, A. I don’t particularly see the need for that now, B. I feel that any approach like that will face massive performance difficulties.
I think that perhaps I don’t understand Julia’s pitch well enough, or that much of the posts I’ve been reading have been focused on differentiating Julia from TensorFlow and not PyTorch. Could someone help me out?
Originally posted on Slack - redirected here.
PS: Also, to be clear, I have similar questions about Swift.