I was thinking about something related, but not about numpy, it’s about pytorch. As many folks know that PyTorch is implementing something seems to be very similar to Julia beneath the Python interface to enable JIT compilation, which is called the TorchScript.
So what I was thinking is why not use Julia directly, we have pyjulia
and Flux
already, the rest won’t be hard, and this will benefits both Julia and PyTorch community in the following ways:
What torch will gain:
-
Julia has an abundant array ecosystem, which includes StaticArrays, OffsetArray, NamedArray, etc. By providing pytorch a Julia backend people in torch side will be able to use these features, especially, NamedArray has been lying in Julia for 4~5 years and people in torch community find this is quite usefully recently: Tensor Considered Harmful
-
Although, I know torch community has some people working on TPU, but that’s not done yet, right? With Julia backend, torch people can use TPU as well.
-
It’ll be definitely easier to implement new operators directly from Julia which is more mature than torch script (or maybe because I know Julia, but you know, at least much easier than writing C++)
What Julia will gain:
- pytorch, as a another large open source project used by many people and companies, I think this will bring this community more people
- Since, in today’s machine learning research community, a lot new research is done in torch based on its previous work which is also done in torch. This will make those old mature algorithm implementations just work for Julia, we can use them from Julia side, although this might be a bit ugly, but this brings a lot new models to Julia side.
- finally, as a machine learning researcher, I have to say, because other people is using python, I sometimes have to write it. By providing a Julia backend for torch, this will make things smoother. At least for myself, I did have a painful time working on custom pytorch tensors in C++, which it might be just a few hundred lines in Julia.
What need to do from Julia side:
- I think one of this year’s GSoC Project is quite important, which will make calling Julia from Python much easier (https://julialang.org/soc/projects/compiler.html)
- Conditional dependencies in Pkg, this is quite important to support different hardware, I think it wouldn’t make sense to have users load
CuArrays
etc. withRequire.jl
from Python side separately. Installing the package withcuda=true
is more explicit and simpler. Well, I tried to push people pay attention and start discussing details about this many times, cuz this is a crucial feature not only for this project but for most deep learning project.
Goal:
- a torch compatible python interface to the Julia side, might need a custom row-wise array, but it’ll be just a wrapper of
Array
.
And maybe there’ll be some other corner cases to make it compatible with torch
(note: it’s about compatible, not re-write another torch, the functionalities of tensors, AD are already in Julia, just to make it compatible with pytorch Python interface and ship it through conda/pip/Pkg
), it should be a Python frontend of a Julia AD/machine learning package (say Zygote + Flux).
I guess we could come up with some proof of concept package first (well, I’m working on several Julia packages recently, so I’d say I’ll try this a bit later in the summer, maybe just during JuliaCon).
I don’t if people in this field feels this similar needs with me.
I’ll post updates once I have some work on this.
(edited: this was for another topic)
But again, yeah, I agree, numpy folks did a great job and in my practice if you are just using what numpy has, it is as fast as Julia with MKL. But for Julia, the thing is, we have not only Array
, but many many custom arrays and custom algorithms with a unified interface. (Like NamedArray, you never find so many custom arrays in Python world, because it’s hard to do etc.)