Polars: Our Big Missed Opportunity

Why was the Polars library for Python developed in Rust instead of Julia? Why did we miss this great opportunity to make our beloved language famous and win the sympathy of the huge Python community?

I propose to popularize the use of Julia by improving/modernizing the most famous Python libraries
It is sad to recognize that Julia adoption has been stagnant for years. And it bothers me to see that DeepSeek, ChatGPT, Claude, LLama, etc. are mostly developed in Python when if they migrated their code to Julia they could save hundreds of millions of dollars in electricity and reduce the carbon footprint.
Today Python is wonderful, it is the second best tool for everything, we should not compete with Python, we should partner with the Python community. Let’s translate the most popular Python libraries into Julia, make it easier to compile them for use from Python, to install them from pip install, and I promise you that Julia will soon be widely adopted. There are millions of Python users who do not develop high-performance libraries because they do not know how to program in C or C++. On the other hand, if the libraries were in Julia language, a greater number of Python library users could participate in their development, even in helping with debugging. With how active and enthusiastic the Python community is, there will undoubtedly be a growing number of people willing to participate in the development and improvement of the most used libraries.

One thing leads to another
We would achieve a rapid adoption of Julia from the very beginning since many library developers will learn Julia to MODERNIZE the Python libraries that are currently in C and C++. Julia can bring multithreading to Python in a very simple way and modernize them. The era of singlethreading is already dead. Let’s be the protagonists of change.
Over time, many users will realize that there is no point in compiling libraries for Python when they can directly use Julia. But for that to happen, the Julia community must grow, the number of Libraries. This is where I see a virtuous loop.

If we don’t do it, Rust will do it
It was obvious that Pandas, one of the most popular Python libraries, needed multithreading and then Rust saw the opportunity to modernize it and made Polars. Why didn’t we do it, since Julia is the best tool for that and we also have much more in common with Python? Python inspired Julia’s design. We resemble the best of Python and R. We could use AI to help with much of the modernization work of Python’s most used libraries.

Same for R libraries
I feel like Julia is having more adoption in the R community than Python. Very well, let’s partner up, modernize their libraries. Maybe Tidyverse (although they are already very modernized). Let’s not compete, let’s seek to partner. Let’s help and we will be rewarded Why? Because we are better, easier and Julia was designed as the replacement for R, Python, Matlab, etc.

Build it and they will come
According to ChatGPT: “A study by the University of Massachusetts estimated that training a single large Transformer model in Python can consume up to 284 tons of CO₂. If Julia were to achieve a 30-50% improvement in efficiency, companies like OpenAI could save hundreds of millions of dollars in electricity and infrastructure costs.”
Why don’t we translate LLAMA into Julia language? We could use an AI to do 80% of the work and we would correct and finish the 20% that the AI ​​fails. I’m not saying it’s easy but it would launch Julia to fame.

Disclaimer
I’m not a systems engineer, nor a senior developer. I’m just a Julia enthusiast, who sees it from the perspective of an R user. I’m an Economist. Surely there are many technical barriers in my proposal. I invite you to comment below. If we are going to sit back and give away Rust’s marketing strategy, let’s at least have a debate.

If you got this far
This post is inspired by this other post, which I recommend reading: [Gradual Julia-ization of Python libraries]

THANK YOU VERY MUCH!
PS: (I have nothing against Rust, I just love Julia!)

3 Likes

Aside from the social aspect, i.e. the magical „we“ needs to be populated by actual people who have the motivation and time to embark on such a task, I do wonder if Julia would really be a better tool than Rust for this if there are no other reasons to choose Julia specifically (which has its own libraries for most things addressed by polars).

9 Likes

Welcome to the Julia community! :wave:

Saw this was your first forum post and wanted to say hey first! Thanks for raising these points; I don’t have much to say, but some thoughts from my own biased view

Economics is pretty fantastic! I’ve been working within JuliaHealth to make a Julia IPUMS.jl package to work with IPUMS census microdata across the globe. Would you like to work with us on that? I have an active interest in health economics and those sort of interactions.

I am not either, but I think you have a great eye for finding problems! And just because that might not be my title, it’s always an adventure to dive right into those problems and try out something to solve a problem.

I know folks would love to have you involved as part of this “we” @Patricio_R; if you want to get more involved, I know several folks here would be happy to give some thoughts/pointers. :smiley:

~ tcp :deciduous_tree:

6 Likes

Because the developers wanted it that way, it’s in the name. Why are you highlighting this one library when there are other data table libraries that have beaten it in some benchmarks?

There are many things that Rust or other languages are objectively better at. Not having to fire up a giant Julia process on top of another primary language’s runtime is one of them.

Don’t base anything on AI hallucinations.

It costed tens of millions of dollars to train. LLMs are still the domain of big tech companies.

6 Likes

No, Rust didn’t do anything. Ritchie Vink did and then many others joined in. One community effort (and strong virtue!) that Rustacians have built and fostered is the “are we ______ yet” meme and rallying point for working groups — Ritchie even talks about “are we DataFrame yet” in that announcement post. This is what arewemachinelearningyet.com looked like around the time of Ritchie’s start. That too takes work — work that’s done by people :slight_smile:

Similarly here — we’re just people doing things! I encourage you to get connected with the Julia machine learning working group.

Finally, I’d caution you against assuming that others aren’t doing smart things or that there’s obvious performance left on the table just because they’re not using our favorite language.

22 Likes

There are some Julia packages that other programming languages can only dream of. They just need a little more publicity (hopefully on the julia-lang blog) to get the recognition they deserve.

Why was the Polars library for Python developed in Rust instead of Julia?

Compile-to-native languages like Rust have some really big advantages if you’re building a library for Python users:

  1. Installation complexity and weight. For Rust, you can just compile standalone binaries for the target architectures and distribute those alongside a minimal Python wrapper. With Julia, you have to instruct users to get Julia up and running on their machine, ask them to wrap their mind around how Julia environments and package management intersect with the already-complex environment and package management considerations in Python, etc. Such challenges are readily surmountable for Julia users, but extra installation steps are a serious impediment when you’re talking about mass adoption.
  2. Startup time. Python packages like Polars are fast out of the box because the user’s code is executed using compiled libraries. With Julia, you get different latencies depending on whether the Julia process needs to start up, whether you hit methods that need to be JIT compiled, etc. Again, this is OK for Julia users who are used to it, but it’s hard to explain to folks who just want to use the library and don’t really care about the implementation language.

they could save hundreds of millions of dollars in electricity and reduce the carbon footprint.

When it comes to deep learning, all the heavy lifting is happening on GPUs (or TPUs) using specialized, optimized code. Python is just doing the orchestration. So I’m not seeing where the dramatic energy savings would come from.

14 Likes

Will the picture change significantly when Julia 1.12 is released, with the capability to compile type-stable code into standalone small binaries? Or is the ecosystem of type-stable packages too limited at this point? I understand that questions about the future cannot be answered with certainty, but I’d be interested to know what the vision is.

2 Likes

AFAIK trimming is still experimental in v1.12, which is now in feature freeze :man_shrugging:

2 Likes

Welcome to the Julia community!

You can thank the virus and Ritchie Vink wanting to do it in Rust (people usually make software for selfish reasons in their favorite language):

the coronavirus has been in our country for a year, which means I have been sitting at home for a very long time. At the start of the pandemic, I had a few pet projects in Rust under my belt and I noticed that the “are we DataFrame yet”, wasn’t anywhere near my satisfaction. So I wondered if I could make a minimalistic crate that solved a specific use case of mine. But boy, did that get out of hand.

It’s unclear to me when Python was added, those seem to be the main supported languages, only ones with prominent install info. But I see in the docs R and JavaScript mentioned too.

Julia meaning DataFrames.jl used to compete with Polars, on H2Oai benchmark, but was usually a bit behind. InMemData.jl is a newer addition (and Tidier.jl building on the former).

It moved to here:
https://duckdblabs.github.io/db-benchmark/

From the top link:

5.2.2 Expensive locking

…

5.2.2 Lock-free hashing

Instead of the before mentioned approaches, Polars uses a lock-free hashing algorithm. This approach does do more work than the previous Expensive locking approach

Yes, single-threading is dead, but it’s unclear we are better at multi-threading than Rust, the reverse is likely true (note we are good at parallelism, myabe as good or better(?), and it’s strictly not the same thing, nor the same as concurrency). Rust and Pony are race-free languages, the latter an even better language seemingly, but Julia and e.g. Go aren’t race-free. Which means we need to be careful with locking, and such multi-threaded code is hard to do right.

It’s perfectly fine for us to rely on Rust in all areas we like, at least for Polars. Either depend on it directly or though its Python wrapper (does it have a better or more useful API?), by using PythonCall.jl.

Another reason, already mentioned, is the Python users, like precompipled code (usually C, some like it better than Rust, since Rust limits platform use, e.g. see crypto library, and the movement to Rust, e.g. FreeBSD people object if I recall).

Relying on Julia would limit platforms even more? Though we do support FreeBSD, i.e. the language, but NOT all of the ecosystem e.g juliacall/PythonCall.jl except through hacks (because of Mamba/Conda).

I still support:

I think we can learn from Pony (deny capabilities) and Rust and be excellent for multi-threaded with some language changes. Or even as POC with some package, and we already have some like that very recently GitHub - MilesCranmer/BorrowChecker.jl: A borrow checker for Julia

juliac is in 1.12, and if we had had it sooner, then it might have helped with adoption from other languages.

just to be clear, Rust is not race-free to general race conditions. only to a subset

fyi I think polars is a phenomenal library and I use it more than any other tool in my professional life.

1 Like

Sorry yes, I meant to be completely specific. Julia (nor e.g. C, Go or Python) doesn’t try to be race-free automatically in any way (only provides macros/tools for making so manually, but such is error prone). Pony does, and is probably best at it. It eliminates data races (no language can do more?), not race conditions since impossible. And it doesn’t need unsafe regions unlike Rust. And it’s still fast and has a GC.

https://www.ponylang.io/faq/about-pony/#data-race

Does Pony really prevent data races?¶

So, this question usually comes in many different forms. And the question usually arises from a misunderstanding of the difference between a “data race” and a “race condition”.

Pony prevents data races. It can’t stop you from writing race conditions into your program.

To learn more about the differences between race conditions and data races, check out “Race Condition vs. Data Race” by John Regehr.

I post this since maybe helpful, and note googling for “ponylang” is better, otherwise it’s not easily googlable… First two tries gave me for kids MY PONY MY LITTLE RACE - Play Online for Free! and and more adult links: https://www.ponyracingauthority.co.uk/wp-content/uploads/2022/01/Race-Types-Conditions-2022.pdf :slight_smile:

Comparisons to Other Languages¶

How is Pony different than Erlang/Elixir?¶

The answer is deep and complicated. Fortunately, Scott Fritchie went to a great deal of trouble answering it in his talk The wide world of almost-actors: comparing the Pony to BEAM languages.

Are pony actors lightweight like Elixir/Erlang’s actors, or Go’s goroutines?¶

Yes! In Pony, the overhead of an empty actor on a 64-bit system is roughly 240 bytes – depending on your system’s size_t and alignment.
[…]
Relatively, Elixir/Erlang actors use ~5x more memory and goroutines use ~8x more memory, but critically Elixir/Erlang and Go handle memory far differently than Pony. The memory management approach that is “best” is project-dependent – Pony offers one more option you can consider for your particular needs.

2 Likes