Polars: Our Big Missed Opportunity

Why was the Polars library for Python developed in Rust instead of Julia? Why did we miss this great opportunity to make our beloved language famous and win the sympathy of the huge Python community?

I propose to popularize the use of Julia by improving/modernizing the most famous Python libraries
It is sad to recognize that Julia adoption has been stagnant for years. And it bothers me to see that DeepSeek, ChatGPT, Claude, LLama, etc. are mostly developed in Python when if they migrated their code to Julia they could save hundreds of millions of dollars in electricity and reduce the carbon footprint.
Today Python is wonderful, it is the second best tool for everything, we should not compete with Python, we should partner with the Python community. Let’s translate the most popular Python libraries into Julia, make it easier to compile them for use from Python, to install them from pip install, and I promise you that Julia will soon be widely adopted. There are millions of Python users who do not develop high-performance libraries because they do not know how to program in C or C++. On the other hand, if the libraries were in Julia language, a greater number of Python library users could participate in their development, even in helping with debugging. With how active and enthusiastic the Python community is, there will undoubtedly be a growing number of people willing to participate in the development and improvement of the most used libraries.

One thing leads to another
We would achieve a rapid adoption of Julia from the very beginning since many library developers will learn Julia to MODERNIZE the Python libraries that are currently in C and C++. Julia can bring multithreading to Python in a very simple way and modernize them. The era of singlethreading is already dead. Let’s be the protagonists of change.
Over time, many users will realize that there is no point in compiling libraries for Python when they can directly use Julia. But for that to happen, the Julia community must grow, the number of Libraries. This is where I see a virtuous loop.

If we don’t do it, Rust will do it
It was obvious that Pandas, one of the most popular Python libraries, needed multithreading and then Rust saw the opportunity to modernize it and made Polars. Why didn’t we do it, since Julia is the best tool for that and we also have much more in common with Python? Python inspired Julia’s design. We resemble the best of Python and R. We could use AI to help with much of the modernization work of Python’s most used libraries.

Same for R libraries
I feel like Julia is having more adoption in the R community than Python. Very well, let’s partner up, modernize their libraries. Maybe Tidyverse (although they are already very modernized). Let’s not compete, let’s seek to partner. Let’s help and we will be rewarded Why? Because we are better, easier and Julia was designed as the replacement for R, Python, Matlab, etc.

Build it and they will come
According to ChatGPT: “A study by the University of Massachusetts estimated that training a single large Transformer model in Python can consume up to 284 tons of CO₂. If Julia were to achieve a 30-50% improvement in efficiency, companies like OpenAI could save hundreds of millions of dollars in electricity and infrastructure costs.”
Why don’t we translate LLAMA into Julia language? We could use an AI to do 80% of the work and we would correct and finish the 20% that the AI ​​fails. I’m not saying it’s easy but it would launch Julia to fame.

Disclaimer
I’m not a systems engineer, nor a senior developer. I’m just a Julia enthusiast, who sees it from the perspective of an R user. I’m an Economist. Surely there are many technical barriers in my proposal. I invite you to comment below. If we are going to sit back and give away Rust’s marketing strategy, let’s at least have a debate.

If you got this far
This post is inspired by this other post, which I recommend reading: [Gradual Julia-ization of Python libraries]

THANK YOU VERY MUCH!
PS: (I have nothing against Rust, I just love Julia!)

Aside from the social aspect, i.e. the magical „we“ needs to be populated by actual people who have the motivation and time to embark on such a task, I do wonder if Julia would really be a better tool than Rust for this if there are no other reasons to choose Julia specifically (which has its own libraries for most things addressed by polars).

5 Likes

Welcome to the Julia community! :wave:

Saw this was your first forum post and wanted to say hey first! Thanks for raising these points; I don’t have much to say, but some thoughts from my own biased view

Economics is pretty fantastic! I’ve been working within JuliaHealth to make a Julia IPUMS.jl package to work with IPUMS census microdata across the globe. Would you like to work with us on that? I have an active interest in health economics and those sort of interactions.

I am not either, but I think you have a great eye for finding problems! And just because that might not be my title, it’s always an adventure to dive right into those problems and try out something to solve a problem.

I know folks would love to have you involved as part of this “we” @Patricio_R; if you want to get more involved, I know several folks here would be happy to give some thoughts/pointers. :smiley:

~ tcp :deciduous_tree:

4 Likes

Because the developers wanted it that way, it’s in the name. Why are you highlighting this one library when there are other data table libraries that have beaten it in some benchmarks?

There are many things that Rust or other languages are objectively better at. Not having to fire up a giant Julia process on top of another primary language’s runtime is one of them.

Don’t base anything on AI hallucinations.

It costed tens of millions of dollars to train. LLMs are still the domain of big tech companies.

3 Likes

No, Rust didn’t do anything. Ritchie Vink did and then many others joined in. One community effort (and strong virtue!) that Rustacians have built and fostered is the “are we ______ yet” meme and rallying point for working groups — Ritchie even talks about “are we DataFrame yet” in that announcement post. This is what arewemachinelearningyet.com looked like around the time of Ritchie’s start. That too takes work — work that’s done by people :slight_smile:

Similarly here — we’re just people doing things! I encourage you to get connected with the Julia machine learning working group.

Finally, I’d caution you against assuming that others aren’t doing smart things or that there’s obvious performance left on the table just because they’re not using our favorite language.

14 Likes

There are some Julia packages that other programming languages can only dream of. They just need a little more publicity (hopefully on the julia-lang blog) to get the recognition they deserve.

Why was the Polars library for Python developed in Rust instead of Julia?

Compile-to-native languages like Rust have some really big advantages if you’re building a library for Python users:

  1. Installation complexity and weight. For Rust, you can just compile standalone binaries for the target architectures and distribute those alongside a minimal Python wrapper. With Julia, you have to instruct users to get Julia up and running on their machine, ask them to wrap their mind around how Julia environments and package management intersect with the already-complex environment and package management considerations in Python, etc. Such challenges are readily surmountable for Julia users, but extra installation steps are a serious impediment when you’re talking about mass adoption.
  2. Startup time. Python packages like Polars are fast out of the box because the user’s code is executed using compiled libraries. With Julia, you get different latencies depending on whether the Julia process needs to start up, whether you hit methods that need to be JIT compiled, etc. Again, this is OK for Julia users who are used to it, but it’s hard to explain to folks who just want to use the library and don’t really care about the implementation language.

they could save hundreds of millions of dollars in electricity and reduce the carbon footprint.

When it comes to deep learning, all the heavy lifting is happening on GPUs (or TPUs) using specialized, optimized code. Python is just doing the orchestration. So I’m not seeing where the dramatic energy savings would come from.

5 Likes

Will the picture change significantly when Julia 1.12 is released, with the capability to compile type-stable code into standalone small binaries? Or is the ecosystem of type-stable packages too limited at this point? I understand that questions about the future cannot be answered with certainty, but I’d be interested to know what the vision is.

AFAIK trimming is still experimental in v1.12, which is now in feature freeze :man_shrugging: