Context
Python has a large community[1]. However, many of its most-used numerical libraries – numpy, scipy, scikit-learn, pandas, pytorch, jax – are not written in Python. Instead, a frontend API is written in Python, hiding a high-performance backend which is usually C/C++.
Speaking from experience, the interface between Python and C/C++ is quite a pain to deal with for development, and each library does it a bit differently. Compare the complex build systems of numpy with pandas with pytorch. It can be quite hard for users to contribute to the low-level backends given the complexity and lack of standardization in this space. Furthermore, while these libraries are very polished due to strong industry support and dedicated core dev teams, smaller libraries that attempt to add C/C++ utilities can be a huge pain to install.
(I feel strongly enough about this point that I at one point illustrated it… )
Big picture
Where I am going with this is that I think there is a great opportunity here for Julia to become a go-to replacement for C/C++ backends inside Python libraries. Python is popular, and Julia is fast and high-level. And thanks to @cjdoris leading the development of PythonCall.jl, we also have a new really nice way to interface the two languages. Again, speaking from experience (with PySR), it’s much nicer to build a high-performance Python library using Julia as a backend than C/C++.
> But why not just have everyone build directly in Julia?
Because organizations with millions of lines of code are not going to instantly switch their entire stack. They need to transition gradually.
Sure, the goal of this might very well be to have everyone use Julia in the end! But there needs to be a bridge.
Now, how can we actually effect a culture shift on the Python ecosystem to prefer Julia as a backend? Well, momentum is a big factor. More tools using Julia will cause more maintainers to think about joining.
But where do we get this momentum? Well, some of this is already happening, for the simple reason that Julia and the Julia community are awesome (see, e.g., pydata/sparse which now has a backend for Finch.jl – via pip install sparse[finch]
). This is great and I think we should try to accelerate it so it happens more and more, so that we all benefit from more users and support.
How do we do that?
Well, let’s do a thought experiment. Say that there is a Python developer who maintains a large library with ~100k lines of code. They have gotten their packaging system just right so that users across several different operating systems can install their library 99% of the time. Now, they also have some code in their library which is quite slow, and wish to speed it up. What would this person need in order for them to consider using PythonCall.jl? How can we make this process painless, and the switch as gradual as possible so as to not break things? How can we make it easier for them to see benefits from a Julia code?
This is where I’m really interested in hearing other people’s ideas: how can we make it easier to gradually Julia-ize a Python library?
Basically how do we get to the API equivalent of the three-click rule for Julia-inside-Python?
Example
I do have some potential ideas on making this process easier but I’m looking for others (as well as a general discussion on simplifying Julia integration into large existing Python projects). Generally, I think that PythonCall.jl ought to have better integration with pyproject.toml (see this issue). I think the way a developer would consider adding an initial dependency on Julia is to have an optional install, like:
pip install mypkg[julia]
which installs the Julia-accelerated version of their library, mypkg
. That way, Julia-based acceleration would be optional to their users, and not have an effect on existing ones.
Now, internal to the library, I am wondering what is the easiest possible way to integrate Julia. One idea is to have a type of branch like the following:
if juliapkg.is_installed():
# Julia-accelerated version
else:
# pure python version
This is probably the least invasive way to do things, so long as the pyproject interface is robust.
In this way I do think that Mojo has the right idea in terms of Python integration. It’s also similar to Cython in that you would have .pyx
files live alongside .py
files.
One other idea is to have some standardized syntax using a Python decorator, like
@juliacall.pydef
def foo(x):
return np.sum(x ** 2)
@juliacall.jldef(foo)
def foo_jl(x)
return "x -> sum(xi -> xi^2, x)"
# ^ Runs `seval` and caches the result
which could let the user define a “Julia version” of a Python function – which are linked via the jldef
call. Ideally this sort of thing would work well with Python objects via full implementations of interfaces in Base (which PythonCall already does a lot of). This could further be enhanced by a VSCode syntax highlighter for Julia-within-Python.
(Using strings for small simple kernels is actually similar to how numexpr works via strings, as well as some of the CUDA Python libraries)
Perhaps another option is to have .jl
files alongside .py
files within a library’s source code (though it may be too large a jump for many Python devs). For example, those .jl
files could automatically have access to the Python namespace within the library, if declared to juliapkg
.
(And there are always ideas out there like @Lilith’s Python.jl!)
Python is 1st among beginners and other, 3rd overall, and 4th among professional programmers: Technology | 2024 Stack Overflow Developer Survey. Also 1st on Tiobe: TIOBE Index - TIOBE. ↩︎