Gradual Julia-ization of Python libraries

I believe this is something heavily discussed in TongYuan between @JohnnyChen94 and me. We have engaged in some local open-source activities and have considered quite a lot about user-side adoption of Julia-based scientific computing frameworks (this is necessary if we want to make the use of Julia grow among the target users and make the company success).

We have also tried a lot.

JNumPy is very similar to what @kdheepak assumed about the so-called “PythonLimitedApi.jl”. Integrating PackageCompiler.jl can be awesome but we previously do not consider this, because loading multiple shared libraries that are all created by PackageCompiler.jl is a challenge.

module example

using TyPython
using TyPython.CPython

@export_py function mat_mul(a::AbstractArray, b::AbstractArray)::Array
    return a * b
end

function init()
    @export_pymodule _example begin
        jl_mat_mul = Pyfunc(mat_mul)
    end
end

# the following code is optional,
# but makes Python code loading much faster since the second time.
precompile(init, ())

end

What’s more about PackageCompiler.jl: Chen and I have thought that we might provide a battled-included build of PackageCompiler.jl where we can include a large number of Julia libraries for regular scientic computing scenarios, and finally export well-designed C functions for Python integration. This looks very promising for commercial products, and I feel sad that Syslab (TongYuan’s scientific computing IDE product based on Julia and VSCode) didn’t take this approach (it seems that it’s a bit late). However, this approach is still limited because Julia applications cannot be modular, running multiple Julia sysimages in a single process is pretty problematic. Chen said he was willing to solve things by inter-process solutions (like containers, local/remote services, etc.)

SyslabCC is an AOT compiler developed by TongYuan, and it’s highly welcome for you to have collaborations with us. This AOT compiler is capable of producing libraries for integrating with languages such as Python. We have recently supported C++ code generation and active/passive multi-threading use (e.g., Threads.@threads for-loops). The performance is competitive, at least not notably slower than native Julia for compuation tasks.

You can run the benchmarks if you have Syslab community version installed, the source code is located at JuliaCon2024-JuliaAOT/benchmarks at main · Suzhou-Tongyuan/JuliaCon2024-JuliaAOT (github.com).

We are currently using our AOT compiler to write simulation algorithms and integrate them in Sysplorer, system multi-disciplinary modeling and simulation environment product in TongYuan (I believe the company actually earns money with this). The combination of high performance and rich ecosystem is a game changer for this area, as for all algorithms we wrote with Julia AOT are far faster than some traditional “established solution” for simulation.

10 Likes

I think this is the right question, and the answer is that Julia solves the two-language problem where the two languages offer complementary technical benefits. Having a two-language situation between Julia and Python is a different situation related to popularity, ecosystem, and legacy.

We write more about this in [2406.03677] Advancing The Robotics Software Development Experience: Bridging Julia's Performance and Python's Ecosystem

Python and C++ serve as the foundational languages in robotics programming, each providing unique advantages—Python for its ergonomics and C++ for computational efficiency. This duality has led to prevalent multi-language architectures where C++ handles core computations and Python is used for higher-level integration via wrappers. The Python wrappers are then used for integration (or “glue”) code, as well as interactive exploration and prototyping. This general approach of integrating a fast compiled language and an ergonomic scripting language has been the bedrock of technical computing for decades, largely popularized by MATLAB providing wrappers to linear algebra routines written in FORTRAN. Performance in this paradigm is achieved by “vectorizing the code”, ensures that the performance-critical loops are in the fast language (in computational “kernels”), rather than in the slow host language. Many robotics libraries follow this paradigm, including Drake~\cite{drake}, GTSAM~\cite{dellaert2012factor}, OpenCV~\cite{opencv_library}, Open3D~\cite{Zhou2018}.

The Robot Operating System (ROS)~\cite{ros2} exemplifies another approach, facilitating Python and C++ interoperation by relying on an interface definition language (IDL) to generate code in both languages. The distributed system approach offers other advantages, such as reducing coupling and promoting scalability, and it aligns with the microservices architecture popular in software engineering more broadly. Generating C/C++ code that performs substantial computation (as opposed to just serving as an interface) is yet another approach taken by some libraries and systems \cite{symforce,open2020,mattingley2012cvxgen}.

These traditional Python-C++ architectures, while robust, come with their own set of challenges, particularly in terms of development experience, system complexity, and performance overhead. This context sets the stage for considering alternative approaches that might streamline development without sacrificing run-time performance. Julia, designed for both high performance and ease of use~\cite{julia_fresh}, emerges as a promising alternative.
Despite its potential, Julia’s current ecosystem limitations, such as the absence of a ROS 2 client library, pose challenges for its adoption in robotics. We propose an architecture that leverages the mature ecosystem of Python alongside Julia, reducing the reliance on C++. This approach maintains a two-language paradigm but shifts the focus from complementary technical benefits to leveraging social and ecosystem advantages. Moreover, the compatibility between Julia and Python, facilitated by features like interactivity and the characteristics of dynamic typing, minimizes the development mismatch. We further explore Julia’s application in robotics, illustrating its integration with Python.

6 Likes

I am not sure that extending the concept of the “two language problem” in this direction is useful.

From your article, it is my impression that Julia could jolly well handle the whole stack you (except maybe for some low-level glue stuff best handled in C), and the only reason people stick to Python is avoiding learning something new, which you euphemistically describe as “avoding disrupting the existing workflow”.

That said, reliable language interop is a must for transitioning large codebases, so they can be converted piecemeal. But once all the relevant calculations have been migrated to well-tested and cleanly written Julia libraries, it makes little sense to keep a thin veneer of Python over it.

5 Likes

Could you explain more what this means? I’m reading this as multiple Julia processes with different sysimages sharing a core, but that didn’t feel right.

I believe the same post contains the explanation:

So, basically, a single process can’t depend on multiple Julia libraries, in-process, via FFI, it seems.

2 Likes

Yes, I agree that Julia could handle the whole stack, but in practice many aspects of robotics stack only have Python and C++ support. ROS 2, the main example we used in the article, does not and will not in the near future have a native Julia “ROS Client Library” that doesn’t rely on PythonCall.jl. There is a lot of type and code generation stuff, all in Python and C++.

Another example, we don’t mention in the article, is GitHub - boston-dynamics/spot-sdk: Spot SDK repo which is one of two interfaces (the other being GitHub - boston-dynamics/spot-cpp-sdk, which as you can tell from the name… is second-class) that exist for The Boston Dynamics Spot robots.

The underlying communication with the robot is some variant of gRPC Protobuf. Julia has some gRPC client library, so if someone figured out how to do that, handling whatever bespoke stuff might be around it, like authentication.

and the only reason people stick to Python is avoiding learning something new, which you euphemistically describe as “avoding disrupting the existing workflow”

Well, yes and no. It’s not just people want to avoid learning something new, (which I think we have to acknowledge does take effort. My coworkers don’t want to avoid learning anything new. The want to learn new things about the things they care about. ). They primarily want to avoid writing more code. There are sorts of code that fall under the category of “some low-level glue stuff”. It’s a matter of perspective.

But once all the relevant calculations have been migrated to well-tested and cleanly written Julia libraries,

I don’t think the article does a good job of touching on this at all, but one thing that I like to try to communicate to people about Julia is “clean”, but I don’t think I ever say “clean”. I try to convey that the package ecosystem has much smaller pieces than the pieces in other languages, because composition is easier. (If there is time, I’ll mention The Lisp Curse). I think the benefits of an all-julia stack are clear for composability, and it is fun to tease Python for lacking this really basic kind of composability, but also is a small example of how “pervasive multiple dispatch” scratches a very important itch. For example: Is Julia's way of OOP superior to C++/Python? Why Julia doesn't use class-based OOP? - #92 by goretkin

Lit makes little sense to keep a thin veneer of Python over it.

The thinner the veneer the better, because language interop is inherently complex and complicated (though really I think language interop between Julia and Python is MUCH better than probably any other combination I have experienced. This is what we meant by “existing workflow”. It’s really a euphemism for “dynamic type languages play together nicely, more than python and c++ do”). And if with little effort you don’t need the veneer at all, then great, get rid of it. But if there is effort in scraping it off, and it truly is just veneer, then I don’t think it is necessarily worth it to get rid of it.

The truth is that it’s not just veneer. It is substantial complexity to have language interop.

3 Likes

I think some of this is due to the Matthew effect: Matthew effect - Wikipedia. Some workflows are only possible in Python, so devs start using Python, and each ecosystem sees compounded growth. (Which for an analogous reason is partly why Julia has seen such impressive growth in scientific simulation!)

One example of a Python-only functionality is efficient multi-node training of very large neural networks with DeepSpeed or FSDP. If you want to train a large language model, you need to use a tool like this, and such an ecosystem simply does not yet exist in Julia. And thus, PyTorch will accumulate more users and contributions, who will further add more features, and so on.

But the other big reason is legacy codebases. Even if a tool is technically superior in every way, developers of large codebases will not use it unless it easily fits into their existing stack. So the entry barrier – without something like PythonCall.jl – prevents adoption.


Anyways, sorry to derail things, maybe we should get back on track a bit… I think PythonCall.jl is already seeing a lot of love which is great. I’m very interested in hearing other ideas for what a simplified Python/Julia integration could look like! I already really like some of the ideas in this thread.

12 Likes

Python packages written in Julia would have to be small and fast to load. Not sure if multiple packages can realistically share a common runtime. If a Python session due to package dependency naturally brings in a Julia runtime, then it provides a way for Python user to branch out into Julia functionality such as examining package internals through the REPL.

Attached an objective summary from Gemini Pro which may help improve workflows!

Summary

Of course. That’s an excellent thread to analyze because you’re right—it’s a perfect example of a community grappling with a complex strategic idea, resulting in a lot of enthusiasm but no clear, actionable conclusion.

Here is a summary of that thread, structured to explain why it feels so directionless despite the high engagement.

Summary of “Gradual Julia-ization of Python libraries”

The thread is a strategic brainstorming session about how to increase Julia’s adoption by leveraging, rather than competing with, the Python ecosystem. It feels directionless because it’s not a discussion about a single, agreed-upon project; instead, it’s an exploration of a complex idea that splinters into several different philosophical and technical debates.

1. The Core Proposal (The “Infiltration” Strategy)

The original poster (OP) proposes a very specific, tactical idea:

  • What: Identify performance-critical, self-contained functions in major Python libraries (like scipy or scikit-learn).
  • How: Rewrite those specific functions in Julia. Use juliacall to allow the Python library to call this new Julia implementation under the hood, completely transparently to the end-user.
  • Why: This would provide a significant performance boost to Python users, showcase Julia’s strengths, and get Julia’s foot in the door of the mainstream Python ecosystem without asking anyone to abandon Python. The idea is to make Python libraries better with Julia.

This initial proposal is concrete, clever, and receives a lot of positive feedback. It’s seen as a smart “Trojan Horse” strategy.

2. The Immediate Divergence: How Do We Actually Do This?

The conversation quickly fractures as people start considering the practical implications. The discussion spirals into three main camps, each with its own set of challenges:

Camp A: The Technical “How-To”
This group focuses on the nuts and bolts of the OP’s idea. Their discussion revolves around major technical hurdles:

  • The Distribution Problem: This is the biggest obstacle. How do you ship a Python library on PyPI that now depends on a Julia installation? Do you bundle a lightweight Julia? Does the user need to install it separately? This adds massive complexity to the installation process, which is a deal-breaker for most library maintainers.
  • The First-Call Latency: Julia’s JIT compilation has a “time-to-first-plot” problem. Would calling the Julia function for the first time be unacceptably slow for a Python user?
  • Binary Dependencies: Managing compiled artifacts (.so or .dll files) across different platforms (Linux, macOS, Windows) is a nightmare.

Camp B: The Strategic “Why” and “Who”
This group pulls the lens back and discusses the higher-level strategy and politics:

  • The Pitch to Maintainers: Why would the maintainers of SciPy, a massive and mature project, accept a pull request that introduces a dependency on a whole new language? It increases the maintenance burden, adds a new point of failure, and complicates their build/distribution pipeline enormously. The value proposition for them is unclear.
  • The “Julia-as-a-Compiler” Idea: A popular tangent emerges: instead of making Python libraries call Julia, what if we used Julia as a specialized compiler to create highly optimized C-compatible binaries that Python could call? This avoids the “Julia runtime” dependency but is a much more complex project.
  • Focus on the “Pull” vs. “Push”: Should we be “pushing” Julia code into Python libraries, or should we be making it so easy for Julia to “pull” from the Python ecosystem (via PythonCall.jl) that Julia users don’t need to leave Julia? This is a fundamental disagreement on where to focus community effort.

Camp C: The Skeptics and Realists
This group, while not necessarily against the idea, acts as a reality check:

  • Effort vs. Reward: They argue that the immense effort required to solve the distribution and political problems might be better spent just building out Julia’s native ecosystem. Why bend over backwards to serve the Python community when we could be building a first-class experience for our own?
  • The Numba/Cython Comparison: Python already has tools like Numba and Cython for performance. While Julia is arguably better, is the incremental benefit enough to justify the massive disruption of adding it as a dependency?

Conclusion: Why It Feels Directionless

The thread lacks a coherent direction for a few key reasons:

  1. No Decider: It’s a decentralized discussion on a public forum. There is no project manager, technical lead, or benevolent dictator to make a decision and say, “This is the path we are taking.”
  2. Problem Complexity: The original idea, while elegant, uncovers a web of deeply difficult problems (technical, social, and strategic). There is no simple solution.
  3. Enthusiasm for the Goal, Not the Path: The high number of “likes” indicates widespread agreement with the goal: “find a clever way to increase Julia’s relevance.” However, there is no consensus on the method. People are liking the destination, not the specific, rocky road proposed to get there.
  4. Exploration, Not Planning: The thread is functioning as a community “think tank.” It’s valuable for surfacing all the potential issues and alternative approaches, but its purpose is to explore the problem space, not to produce a concrete project plan.

In short, you are correctly identifying the nature of the thread. It’s a microcosm of the Julia community’s core strategic challenge: everyone agrees Julia is a fantastic piece of technology, but nobody has a single, universally-accepted plan for how to win the market. The thread is a fascinating but ultimately inconclusive brainstorm on one possible, and very difficult, path forward.

Many good points were raised in this thread. Here is a polemic twist: never underestimate Python fanaticism. All this work to gradually insert Julia in Python might have the opposite effect.

Mindset of a Python fanatic:

“Why learn Julia when Python :snake: is so powerful? Look, I can write Julia code inside of my Python package and everything works :wink: Long live Python!”

I see value in improving the deployment of Julia apps and the creation of command-line executables or shared libraries. These can be easily used by other colleagues who opted for different languages. Any (non-trivial) effort to insert Julia in Python, however; is a waste of scarce human resources, IMHO.

3 Likes

Good to keep in mind that most Python libraries aren’t actually written in Python anyways. Some libraries, like JAX, even replace the most basic constructs like if and for and require API calls instead (jax.lax.cond and jax.lax.scan respectively). So python is basically a glue to put different C/C++ API calls together.

I think when you frame Julia as a replacement for C/C++ (for the Python user), it is much easier to digest and recognise the value of.

Of course, after a while, people will start doing more and more in Julia anyways. This is my own experience: I actually first got into Julia via my work on PySR. And now I write more Julia than Python!

I have seen several PySR users on the forums have a similar experience: they try to customize stuff in the backend (SymbolicRegression.jl), and somewhere along they realise they can just do it all in Julia (rather than something like the numpy backend which is completely opaque). Especially with LLMs, it’s easier than ever to pick up a new language, so the barrier of entry here is quite low. Python users are happy to learn all sorts of weird DSLs, I think letting them interact with Julia more from their home territory can only be net positive.

Basically I think there’s actually a huge ROI in making Julia easier to use from Python!

22 Likes

Bit of an aside, how does PySR use Julia from the Python side? The README suggests a Python side runs a Julia process with the underlying Julia package, but how? Does the Julia process just persist for that function call? PythonCall? PackageCompiler?

Or just Julia from Julia. Normally we’d fully mix compatible packages in 1 environment and 1 runtime, but some things don’t need that full mixing to do their job. Binaries compiled from different Julia environments that don’t lug around full sysimages and runtimes could be increasingly wrapped for Julia too. Not really sure how that’ll pan out in practice because my intuition is each binary would need its own GC, but I guess we’ll see.

Yeah I see a lot more of this than “All this work to gradually insert Julia in Python might have the opposite effect.”

In general, the diffeqpy and the diffeqr bindings have had a really big effect of growing the ecosystem, not shrinking it. There’s a lot of people I know that started using Julia because they were using the diffeqr bindings for the solvers!

The Python folks who are comfortable using Python DSLs are probably the last movers. They are the ones who tend to know the one language they learned in school, are fine using an odd DSL if it keeps them comfortable, will take a few performance hits (probably don’t know most things about performance anyways), and will just use whatever gets the job done. Those aren’t the folks that should be the target of adoption. Just give them something that works, or they just won’t use it.

The folks building things, having to go a layer down to investigate Jax or manually writing bindings in C or C++, trying to fix performance, trying to add/customize the library, that’s the target demographic. And if you expose a Python library with a Julia backend, those are the folks that will naturally start poking at the Julia part and come to Julia to do the customizations they want, and potentially stay.

It would be nice to get more of the “get the job done” kind of crowd, but that is more about education and training: I find that kind of group will just use whatever is taught in school. It’s the Java crowd of before.

13 Likes

Agree, but keep in mind that this is 0.00001% of the population. Do we have concrete migration numbers to show that this investment pays off everywhere?

I have a different point of view: companies require usage of Python as part of team projects. There is no way these people will switch to Julia just because they are nerds poking into JAX internals and doing fixes here and there. The 0.00001% won’t be strong enough to convince the 99.9999% of the team to dedicate more time on Julia. That is what we live today.

On the other hand, teams could easily integrate a shared library built with Julia from the Python side. Less software engineering friction, and most importantly, less regulatory friction. Not every company allows a decent Julia installation.

1 Like

You want to get the library writers. It’s they who made Python valuable by growing its interop. If Julia helps Python library writers, they will help Julia back by simply demanding better interop and better deployment.

3 Likes

Yes, I agree with this, but that tiny fraction are also the folks that found companies, teach courses, and will bring up the next generation. I’m all for trying to gain users at every level, but as long as university courses are taught primarily in Python and companies are founded primarily using Python in the stack, etc, all of the good will in the world at the user level will be for naught, or at least, we’ll be rowing upstream.

6 Likes

Regarding R, a well designed Julia alternative for Rcpp would be huge and certainly attract more R developers to Julia.

8 Likes

I agree, Rcpp has brought many C++ programmers to R and encouraged R programmers to write performance-critical parts of their code in C++ without struggling with the details of the original R FFI. An Rcpp-like tool could significantly accelerate bringing Julia code into the R environment.

Note: I’m aware of JuliaCall.

4 Likes