PyTorch and Julia

I was thinking about something related, but not about numpy, it’s about pytorch. As many folks know that PyTorch is implementing something seems to be very similar to Julia beneath the Python interface to enable JIT compilation, which is called the TorchScript.

So what I was thinking is why not use Julia directly, we have pyjulia and Flux already, the rest won’t be hard, and this will benefits both Julia and PyTorch community in the following ways:

What torch will gain:

  • Julia has an abundant array ecosystem, which includes StaticArrays, OffsetArray, NamedArray, etc. By providing pytorch a Julia backend people in torch side will be able to use these features, especially, NamedArray has been lying in Julia for 4~5 years and people in torch community find this is quite usefully recently:

  • Although, I know torch community has some people working on TPU, but that’s not done yet, right? With Julia backend, torch people can use TPU as well.

  • It’ll be definitely easier to implement new operators directly from Julia which is more mature than torch script (or maybe because I know Julia, but you know, at least much easier than writing C++)

What Julia will gain:

  • pytorch, as a another large open source project used by many people and companies, I think this will bring this community more people
  • Since, in today’s machine learning research community, a lot new research is done in torch based on its previous work which is also done in torch. This will make those old mature algorithm implementations just work for Julia, we can use them from Julia side, although this might be a bit ugly, but this brings a lot new models to Julia side.
  • finally, as a machine learning researcher, I have to say, because other people is using python, I sometimes have to write it. By providing a Julia backend for torch, this will make things smoother. At least for myself, I did have a painful time working on custom pytorch tensors in C++, which it might be just a few hundred lines in Julia.

What need to do from Julia side:

  • I think one of this year’s GSoC Project is quite important, which will make calling Julia from Python much easier (
  • Conditional dependencies in Pkg, this is quite important to support different hardware, I think it wouldn’t make sense to have users load CuArrays etc. with Require.jl from Python side separately. Installing the package with cuda=true is more explicit and simpler. Well, I tried to push people pay attention and start discussing details about this many times, cuz this is a crucial feature not only for this project but for most deep learning project.


  • a torch compatible python interface to the Julia side, might need a custom row-wise array, but it’ll be just a wrapper of Array.

And maybe there’ll be some other corner cases to make it compatible with torch (note: it’s about compatible, not re-write another torch, the functionalities of tensors, AD are already in Julia, just to make it compatible with pytorch Python interface and ship it through conda/pip/Pkg), it should be a Python frontend of a Julia AD/machine learning package (say Zygote + Flux).

I guess we could come up with some proof of concept package first (well, I’m working on several Julia packages recently, so I’d say I’ll try this a bit later in the summer, maybe just during JuliaCon).

I don’t if people in this field feels this similar needs with me.

I’ll post updates once I have some work on this.

(edited: this was for another topic)

But again, yeah, I agree, numpy folks did a great job and in my practice if you are just using what numpy has, it is as fast as Julia with MKL. But for Julia, the thing is, we have not only Array, but many many custom arrays and custom algorithms with a unified interface. (Like NamedArray, you never find so many custom arrays in Python world, because it’s hard to do etc.)


@Roger-luo I think your idea is very interesting (Indeed PyTorch came from Torch, a Lua library).

1 Like

I really like this idea. I think there’s a lot to gain for the machine learning community with this approach.

So, your plan is to write (1) a TorchScript-to-Julia transpiler and (2) a runtime library required to execute transpiled TorchScript in Julia using Flux? Then, the Python users can invoke it via PyJulia and Julia users can directly load it?

In case of this project, I think you can simply do this via (say) jltorch.install(cuda=true) like diffeqpy.install() which simply invokes Pkg.add(["PyCall", "DifferentialEquations"]).

I’d say the largest obstacle to improve Julia-Python interaction is precompilation cache handling. Here are some pointers for the changes required in Julia:

I also want to see signal handler situation to be improved:

1 Like

Thanks for your suggestions, this is very helpful.

I mean directly map part of the torch function to Julia via pyjulia. There won’t be a torch script anymore. Like what diffeqpy do, amd yes, torch user could still use their old code, but the backend is changed (backend I mean ATen and Autograd in C++)

I think that pyjulia is a really strategic package and core Julia should ease the pain of using it. I know some projects that consider the use of Julia for hot-loops and right now, the set-up of pyjulia is far from plug and play and some people is reluctant to chose this approach. Hopefully, this will be easier as development goes by! Anyway, right now it’s usable and proof-of-concepts can be done to highlights the possibilities.

1 Like

Personally I’d be very interested to see some development of the named tensor concept. I think the current approach to this is AxisArrays. But as I I’ve looked over the most recent work in the package it seems there’s a bit of collective discontent with bits of the current API it has (maybe I’m just projecting a little :wink:).

I see. Now I realized that’s what you said in your first post. My bad. Transpiling was the topic in the original thread so I misread your post.

I don’t use TorchScript so I don’t have the full picture. But is TorchScript only used by PyTorch experts? If TorchScript is something used also by ML researchers, and if your plan is to use Julia language instead of TorchScript, my concern is that they may not use this backend if they feel they don’t have time to learn a new programming language.

But I think using PyCall/PyJulia only for high-level Julia-Python interaction is a good approach in general. I just don’t know PyTorch enough to see how appealing it would be for PyTorch users.

I think the shortest path for making PyJulia easier to use is RFC: a "workaround" for the multi-project precompilation cache problem without long-term code debt with (The idea is to create a system image for PyJulia and let Julia automatically use precompilation cache dedicated to PyJulia.) Maybe you can help me in that thread :slight_smile:

I also added Julia option support recently (available only in master). You can use Julia(compiled_modules=False) to workaround the compilation cache problem: It would make PyJulia setup plug-and-play if you are OK with waiting for precompilation for each Python process (yeah, I know this is not super practical for all purposes).

Well, TorchScript is kinda a python like new language as well, and it is not as mature as Julia yet (not every command you write intuitively is working), which is because again it’s hard to accelerate something like Python. I think it’s used for people writing fast custom operations while not writing c++. So it depends on what you need, currently I don’t see much people using torch script, because people don’t want to learn another new language… (at least among my collaborators, they’d rather writing c++, but some of these people know Julia already)

But Julia comes in as a nice option when someone really need to extend the functionality, define a lot customized tensor, data type (e.g complex number tensor, dual number tensor etc.) I don’t think this is doable in torchscript yet, and not straight forward in C++ (pytorch has data type hooks now, but yet not mature enough). I think this will be the best option and make this kind of user willing to learn a new language :-).

Well it’s hard to make someone who is OK with Python to learn another language, but it’s always easy to persuade a multilingual guy to learn another language who knows Python.

So I guess it will depends how fast and how mature we could make it.

And I notice that diffeqpy can use numba with Julia, is this something still working today?

1 Like

As TorchScript is a subset of Python, I thought you can use it also in good old define-by-run mode? If that’s the case, I suppose you can use all the mature Python tools like a variety of debuggers? (Even though the TorchScript compiler may not be matured yet.)

I see. It makes sense now. Thanks a lot for explaining this.

My another concern is the overhead of calling Julia function from Python. I thought PyTorch devs wanted to develop JIT because even calling C extension from Python had some overhead (but I’m not sure if this info is fresh and correct). But this probably can be improved once enough people are interested in this. There is already optimization you can do like pyfunction.

I don’t think there is much benefit in using numba with diffeqpy unless the computing the derivatives is so computation-intensive such that overhead of calling Python function from Julia is neglegible and such computation-intensive code can benefit from using numba (i.e., way faster than composing numpy functions). But @ChrisRackauckas may know other use-cases.

I think a better approach would be to expose a Pythonic API for ModelingToolkit.jl or create SymPy-to-Julia transpiler specialized for diffeq. Having said that, it would be very interesting (at least in purely curiosity-based sense) if you can inline LLVM IR generated by Numba into Julia function.

1 Like

Yeah, that is the use case, and it should still work. If it’s a PDE or stochastic PDE discretization then this makes a lot of sense. If it’s a 3 ODE system then this overhead matters a lot. So it’s just a use case kind of thing. The asymtopically large problems are more algorithm driven (since implementation-wise everything is dominated by the sparse linear solver cost) and so diffeqr and diffeqpy are very effective interfaces to share our algorithms (since in this case, being algorithmically efficient matters a lot more than low level efficiency). This makes the performance of the bridge use-case dependent, similar to NumPy though.

Yup, I think this is the direction to go. And once we are sufficiently advanced here, we could just have some kind of DSL file format where the differential equation can be written in a way that can be used from any of the bridges we make to other languages. The core of DiffEq will always be Julia, but this would then be a nice way to make all of the advances useful everywhere else.

I think this is a good place to clarify my stance on the “language wars”. DifferentialEquations.jl and its components are built in Julia because of the competitive advantage we get for being able to make full use of the compilation structures of the language. Not just speed but also a lot of features, like easily writing compilers like ModelingToolkit, full language AD systems like Zygote, tooling built with Cassette, etc. that just aren’t available elsewhere. However, our dedication is to the science and not necessarily the language. I plan to continue to make bridges to allow DiffEq to be used from other languages. Tooling like the ability to compile OrdinaryDiffEq.jl on Float64s to a binary for use from Python/R will go a long way to making this nicer. Julia’s development advantages also make it a great language for building libraries, and we should share that as much as possible. Power users will still come to Julia since there will always be more you can do when using it from Julia of course (some of the ML integration is an example, in cases not using adjoint sensitivity analysis). And honestly, these tools have been great Julia recruitment tools not just for power users.


This sounds like a great approach. But one thing I’ve been wondering is that in DSL you tend to lose benefits in the host language like debugger and how to deal with that. This mirrors different debuggability in define-by-run and static graph approach of NN frameworks. DSL is an obvious way forward for readability and writability for complex mathematical models. It would be great if debuggability (and other benefits in Julia like @code_*) can be recovered. Maybe that’s as easy as recording and attaching LineNumberNode when generating function. I think this would benefit Python/R/… users, too. I don’t think it’s too crazy to launch Julia debugger from Python debugger.

:clap: :clap: :clap:

1 Like