Gradual Julia-ization of Python libraries

Context

Python has a large community[1]. However, many of its most-used numerical libraries – numpy, scipy, scikit-learn, pandas, pytorch, jax – are not written in Python. Instead, a frontend API is written in Python, hiding a high-performance backend which is usually C/C++.

Speaking from experience, the interface between Python and C/C++ is quite a pain to deal with for development, and each library does it a bit differently. Compare the complex build systems of numpy with pandas with pytorch. It can be quite hard for users to contribute to the low-level backends given the complexity and lack of standardization in this space. Furthermore, while these libraries are very polished due to strong industry support and dedicated core dev teams, smaller libraries that attempt to add C/C++ utilities can be a huge pain to install.

(I feel strongly enough about this point that I at one point illustrated it… )

Big picture

Where I am going with this is that I think there is a great opportunity here for Julia to become a go-to replacement for C/C++ backends inside Python libraries. Python is popular, and Julia is fast and high-level. And thanks to @cjdoris leading the development of PythonCall.jl, we also have a new really nice way to interface the two languages. Again, speaking from experience (with PySR), it’s much nicer to build a high-performance Python library using Julia as a backend than C/C++.

> But why not just have everyone build directly in Julia?

Because organizations with millions of lines of code are not going to instantly switch their entire stack. They need to transition gradually.

Sure, the goal of this might very well be to have everyone use Julia in the end! But there needs to be a bridge.

Now, how can we actually effect a culture shift on the Python ecosystem to prefer Julia as a backend? Well, momentum is a big factor. More tools using Julia will cause more maintainers to think about joining.

But where do we get this momentum? Well, some of this is already happening, for the simple reason that Julia and the Julia community are awesome (see, e.g., pydata/sparse which now has a backend for Finch.jl – via pip install sparse[finch]). This is great and I think we should try to accelerate it so it happens more and more, so that we all benefit from more users and support.

How do we do that?

Well, let’s do a thought experiment. Say that there is a Python developer who maintains a large library with ~100k lines of code. They have gotten their packaging system just right so that users across several different operating systems can install their library 99% of the time. Now, they also have some code in their library which is quite slow, and wish to speed it up. What would this person need in order for them to consider using PythonCall.jl? How can we make this process painless, and the switch as gradual as possible so as to not break things? How can we make it easier for them to see benefits from a Julia code?

This is where I’m really interested in hearing other people’s ideas: how can we make it easier to gradually Julia-ize a Python library?

Basically how do we get to the API equivalent of the three-click rule for Julia-inside-Python?


Example

I do have some potential ideas on making this process easier but I’m looking for others (as well as a general discussion on simplifying Julia integration into large existing Python projects). Generally, I think that PythonCall.jl ought to have better integration with pyproject.toml (see this issue). I think the way a developer would consider adding an initial dependency on Julia is to have an optional install, like:

pip install mypkg[julia]

which installs the Julia-accelerated version of their library, mypkg. That way, Julia-based acceleration would be optional to their users, and not have an effect on existing ones.

Now, internal to the library, I am wondering what is the easiest possible way to integrate Julia. One idea is to have a type of branch like the following:

if juliapkg.is_installed():
    # Julia-accelerated version
else:
    # pure python version

This is probably the least invasive way to do things, so long as the pyproject interface is robust.

In this way I do think that Mojo has the right idea in terms of Python integration. It’s also similar to Cython in that you would have .pyx files live alongside .py files.

One other idea is to have some standardized syntax using a Python decorator, like

@juliacall.pydef
def foo(x):
    return np.sum(x ** 2)

@juliacall.jldef(foo)
def foo_jl(x)
    return "x -> sum(xi -> xi^2, x)"
    # ^ Runs `seval` and caches the result

which could let the user define a “Julia version” of a Python function – which are linked via the jldef call. Ideally this sort of thing would work well with Python objects via full implementations of interfaces in Base (which PythonCall already does a lot of). This could further be enhanced by a VSCode syntax highlighter for Julia-within-Python.

(Using strings for small simple kernels is actually similar to how numexpr works via strings, as well as some of the CUDA Python libraries)

Perhaps another option is to have .jl files alongside .py files within a library’s source code (though it may be too large a jump for many Python devs). For example, those .jl files could automatically have access to the Python namespace within the library, if declared to juliapkg.

(And there are always ideas out there like @Lilith’s Python.jl!)


  1. Python is 1st among beginners and other, 3rd overall, and 4th among professional programmers: Technology | 2024 Stack Overflow Developer Survey. Also 1st on Tiobe: TIOBE Index - TIOBE. ↩︎

27 Likes

I get the “just works” approach here, but not many people will like hidden installation-dependence like this, especially in an interactive session where imports can’t be undone. It’s more idiomatic to mirror the 2 versions in 2 modules, and the user can specify which they use (kind of like CuPy and NumPy, though the mirroring is more for easily leveraging NumPy functions’ type dispatch system to forward to the CuPy version). The import system, like Julia’s, lets you pick a function from either module or disambiguate both, which will be useful for comparisons.

If the Julia versions only cover a subset of any Python module, and the user wants to use the subset without learning what it is, then a fully independent module would be unfeasible. You can still let the user have their pick with a higher order function doing the non-default choice, doing the Python version if automatically attempting a Julia version with a similar branch or vice versa. Either way I expect some metaprogramming designed to only involve the functions with versions, leaving the pure Python code alone.

The major problem for a user is that they likely do want Julia’s multimethods, and wrapping all the code in a Python unimethod makes that inconvenient. PythonCall and PyCall already have the more typical ways (eval variants, interpolated non-standard strings) of embedding code. Instead, this could bridge Python and Julia functions by name, the methods being evaluated into the Julia side separately.

1 Like

There is certainly precedent for the juliapkg.is_installed() approach though. There are some libraries that have JAX backends, and those backends only “activate” if the user has JAX installed. Another example is einops where the backends are only used depending on what array type you pass it (sort of like a hand-rolled multiple dispatch).

As another example, numba is a fairly popular way to add fast jitted methods within a Python library, as such methods can just be written for numpy arrays:

from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]): 
        trace += np.tanh(a[i, i]) 
    return a + trace              

print(go_fast(x))

People seem to like this as they can just insert a fast numba kernel inside a large existing library, without needing to restructure everything. I wonder how we can enable something similar in convenience via juliacall.

1 Like

Once we merge the ability to build small binaries and libraries, wouldn’t it be simple to load and call Julia libs in Python just like any other library?

14 Likes

I think for the kind of use-case discussed above you might actually want the full Julia runtime. Python devs seem happy to include JIT compilation in Python libraries, like with Numba and JAX, without wanting to compile to an executable. Compiling and shipping standalone binaries would make the packaging more complicated here (most people doing Python stuff don’t need 0 latency to startup anyways; for that there’s C++/Rust). I think Julia precompilation makes startup time pretty good anyways.

I think the main bottleneck now is mostly about ease-of-integration and compatibility. e.g., (1) how do I add Julia packages directly to a pyproject.toml, (2) how can I make the use of Julia optional, (3) how can I very gradually add Julia-accelerated kernels without needing to replace everything at once, (4) how can I use PrecompileTools.jl for my Python<->Julia interface,…

6 Likes

I agree with many points, but stand by my initial thoughts because of a few differences in perspective. Note that unlike your pydef/jldef example, Numba’s @jit just compiles bytecode for a supported subset of Python/NumPy/SciPy, there’s no separate Numba source code. A more apt comparison is Numba’s @overload (previously @generated_jit) for reimplementing functions unsupported by nopython mode from the supported functions. Most users are not encouraged to use @overload because 1) there’s no demand for writing an unused slow Python version from scratch, and 2) you need to test semantic equivalency and performance between the original and the manual reimplementation until hopefully Numba supports more of Python/NumPy/SciPy. By contrast, writing the same thing in a @jit or @vectorize function nearly guarantees semantic equivalence between the bytecode and LLVM-compiled code. In the cases where it’s not, Numba offers the py_func attribute to run the bytecode and deprecated @jit’s silent fallbacks to the CPython-using object mode to finely divide the versions, which only affirms how different this approach is from backend switches.

Focusing on the jldef part, I’ll reemphasize that it’s really not good for multimethods. This is what it looks like in Numba:

@overload(select)
def ol_select(x):
    if isinstance(x, types.Float):
        def impl(x):
            return x + 1
        return impl
    elif isinstance(x, types.UnicodeType):
        def impl(x):
            return x + " the number one"
        return impl
    else:
        raise TypeError("Unsupported Type")

When we’re done recoiling in horror, we can easily recognize that the branch can be replaced with neater argument type annotations. But once the outer Python function is defined, we are stuck with those methods, and isolation to a Python function really hampers interactions with helper functions. With PyCall or PythonCall, there already is no need for this limitation, and there is already enough work in making sure type conversions to the Julia side and back to the Python side work correctly.

I will add that as important as small binaries are to libraries that need the efficiency of something like Rcpp or MATLAB’s MEX files, there are many good cases for hauling a compiler around for interactive code, even something as simple as making a NumPy ufunc.

viralbshah is right that if compilation latency and lugging around a JIT compiler is an issue, then developers will likely prefer small binaries and reduced runtimes over slow pure Python fallbacks and equivalency tests.

Sorry if I missed that, but it is still not clear to me why the Julia community should invest effort in this. Given that it is backend code, how do we

? The users of these libraries will either not care that it is Julia under the hood, or just curse at the language when the complicated toolchains lead to problems (which happens inevitably, regardless of the languages involved).

Given that one of the original motivations for creating Julia was solving the two-language problem, why make any special effort to become part of a two-language setup as a backend?

9 Likes

I think almost all of us on this forum can agree that it’d be best if everything could be easily done in Julia, but there are people making cool things in other languages for good reasons, making interop the next best thing. Julia is no stranger to wrapping standard-issue libraries, and conversely it’d be fantastic for Python/R/MATLAB developers to have another interactive and dynamically typed language as a feasible backend in more contexts, at least before getting into the weeds of learning a less used language and the less mature tooling.

And sure, it’d start as writing some Julia code for a few functions in a Python library. Then that library gets more and more Julia in the backend, derivative libraries are built on top of Julia from scratch, and they end up making Julia packages with Python wrappers. When something is proven more useful to people, people will want to use it more, and usable code is much better proof than hypothetical benefits.

7 Likes

I think calls like this are meant for folks interested in it; if the entire topic is not appealing, then it’s probably not for you. But the Julia community is not a monolith; there are members of the Julia community who are also python users, or who need to support python users, or who just want to reach more folks with their work.

Sometimes they will, since there is a user → developer pathway as (a small number of) users start to want to add features they need or fix a bug they ran into (and getting to Julia as a backend can be much more accessible than C/C++).

18 Likes

I mean, this is exactly why I am a member of the Julia community :sweat_smile: I started writing some Julia kernels for my pure-Python symbolic regression code, and then, after a while, just started writing Julia libraries and contributing to Julia instead :slight_smile:

20 Likes

Yeah this is a good point I hadn’t considered. Some Python libraries may want to use Julia for the ecosystem (there are several things where there are only Julia libraries for at the moment, especially in scientific community) but may not be interested in shipping the Julia runtime to users… So, I can totally see this as another avenue to explore more.

And yeah I think juliac will certainly help with this.

Some other syntax ideas for making Julia easier to integrate gradually to a Python library:

Say we have a file in our library file1.py:

import numpy as np
from juliapkg import jldispatch

@jldispatch("file2.jl", function="foo2")
def foo(x):
    return np.sum(x ** 2)

where file2.jl would be in the same directory as file1.py, and have:

function foo2(x)
    return sum(xi -> x^2, x)
end

Here, basically jldispatch(file, func_sym) could attach the Julia function foo2 to the Python function foo, via:

def jldispatch(file, function):
    def apply(f):
        if juliapkg.juliacall_installed():
            jl.seval(file)
            return jl.seval(function)
        else:
            return f
    return apply

So this could let you easily attach Julia code to Python functions.

Perhaps this jldispatch could also have an extension argument which could associate the juliapkg.isinstalled() to a particular Python extension. That way, only pip install mypkg[julia] would set up the Julia acceleration.

3 Likes

I really like the idea of incrementally being able to introduce Julia!

Another approach I don’t see mentioned here is building a shared library that acts like a Python module.

Let’s say we implemented a Package in Julia that converted Python’s Limited API to a Julia package called PythonLimitedApi.jl. Running Clang.jl’s generate on Python.h with Py_LIMITED_API set would be a start.

Then it should be possible to write this code in pure Julia:

using PythonLimitedAPI # TODO: this does not exist

Base.@ccallable function hello_from_julia()::Cvoid
    println("Hello from Julia!")
end

const ExampleMethods = [
    PythonLimitedAPI.PyMethodDef(
        "hello", 
        hello_from_julia, 
        PythonLimitedAPI.METH_NOARGS, 
        "Returns a greeting from Julia."
    ),
    PythonLimitedAPI.PyMethodDef()  # Sentinel
]

const examplemodule = PythonLimitedAPI.PyModuleDef(
    PythonLimitedAPI.PyModuleDef_HEAD_INIT,
    "example_pyjulia_module",
    nothing,
    -1,
    ExampleMethods 
)

Base.@ccallable function PyInit_example_pyjulia_module()::Cvoid
    return PythonLimitedAPI.PyModule_Create(examplemodule)
end

and build a shared library like example_pyjulia_module.dll or libexample_pyjulia_module.so with PackageCompiler.

create_library(
    "ExamplePyjuliaModule", 
    "ExamplePyjuliaModuleCompiled";
    lib_name="libexample_pyjulia_module"
)

With that built, in Python, we’d be able to run:

import example_pyjulia_module
example_pyjulia_module.hello()

The Py_LIMITED_API will work across multiple versions of Python (>3.2) and can be automated with macros.

Imagine something like the following expands to the previous code:

@pymodule_export module ExamplePyjuliaModule

"""
Returns a greeting from Julia.
"""
@pydef_export function hello_from_julia()
    println("hello from Julia")
end

end

With PackageCompiler today or with static compiler efforts in the future, this can be made into a .whl and pushed to PyPi and can be added and used in any Python project with poetry add example_pyjulia_module or pip install example_pyjulia_module.

This is effectively what PyO3 in Rust.

7 Likes

Definitely. Just adding a Rust dependency to some prominent Python packages caused significant pains for the packagers of python packages. Julia as a dependency, as-is, would be even more of a pain, I guess.

Furthermore, Julia is usually deployed with a dedicated fork of LLVM, instead of the usual LLVM. This could cause special trouble for Python packages which link to libllvm.

If some of these issues were alleviated, perhaps an effort like this could succeed in giving Julia a higher profile…

2 Likes

I think this is why for any package that has broad use, a Julia dependency should be introduced gradually. So, e.g., installing it with pip install mypkg[julia] would add the Julia-accelerated kernels, while pip install mypkg would use the usual Python backend. This is the approach that pydata’s sparse package takes (which has a large number of dependent packages): you use pip install sparse[finch] to use the Julia (Finch.jl) backend, but the usual pip install sparse without it.

This is also why I feel like shipping the Julia runtime (a la PythonCall.jl) is likely a better option for the time being rather than relying on package compilation. The automatic Julia install is pretty nice and seems to not introduce new compatibility issues (at least for my userbase).

3 Likes

Would it? It seems Julia’s LLVM is explicitly vendored by changing the shared library name. I’m not sure about the symbols.

I guess one would have to try Julia and Numba in the same process…

2 Likes

Quite possibly I’m wrong or just outdated.

This PyTorch thread could be relevant here: Where we are headed and why it looks a lot like Julia (but not exactly like Julia) - compiler - PyTorch Developer Mailing List

2 Likes

LLVM symbol conflicts are, or at least have been, a problem. See e.g. loading pytorch causes invalid pointer crash on free() · Issue #973 · JuliaPy/PyCall.jl · GitHub. I have also had trouble a few times when PyCalling Numba code, although it has been some time since I needed to do that.

5 Likes

As someone who builds Julia code inside an organization that primarily uses Python, really good integration with Python is very important. I’m allowed to write things in Julia (because I’ve proven it to be much more effective at solving difficult problems) as long as the generic developers building higher-level solutions in Python can use them. Thankfully, we containerize our solutions so this is quite feasible. Moreover, thanks to @MilesCranmer, I had some solid working examples of decent Python integration strategies which allow me to make my work available to Python developers with fewer complaints. However, there is room to improve.

Whether we like it or not, Python is seen as the “safer” option to build out numerical solutions with. In some ways it’s safer (database/cloud/connectivity technologies tend to bend over backwards to support Python SDKs), but in many ways it’s more risky (mainly due to performance, but Julia now has much better libraries in some key strategic areas like SciML). For me, the first step to selling Julia is the ability to address some of the risks of Python while being able to keep the benefits (compatible with their existing Python systems and data sources). If I’m able to help people tie Julia code into Python systems to rapidly solve hard problems they can’t solve in Python, it’s a much easier sell. In my experience, the biggest turn off from Julia isn’t syntax or learning a new language (people seem to find it interesting and clean), it’s deployment, integration with existing Python codebases, and compatibility with third party SDKs. Anything that reduces these barriers makes it a lot easier for me to sell Julia as a transformative technology that’s worth the risk.

34 Likes