I love microbenchmarks.
Once in a while on this forum, some Julian discovers a GitHub repo with a microbenchmark that compares the speed of different programming languages. Typically, Julia does poorly compared to C or Rust. The horror!
Fortunately, Julians from all over the world quickly pour in to right the wrong, and make a PR to improve Julia’s timings. After which typically follows a long discussion in the thread, where people try to squeeze the last microseconds out of the benchmark using advanced techniques like LoopVectorization, unsafe code, or what have you.
And I get that, I do! Getting Julia on the number one spot in the benchmarks is fun because it feels like winning. It’s also a fun puzzle to figure out how squeeze the microseconds, and on top of that, you learn more about Julia and low-level computing when you study how to push your code to the limits.
But none of those things is why I love microbenchmarks: I love them because they pull the wool from your eyes and reveal just how fast your favorite pacakge or programming language really is, after you’re done with all the excuses.
For sure, simply looking at the numbers in the microbechmark isn’t all that informative. It’s not the numbers themselves that matter, it’s the numbers when you pay close attention to what actually takes time. To show you what I mean, let me give some examples:
In 2019, a paper was published announcing the Seq programming language - a high performance intended for bioinformatics. The paper including microbenchmarks that showed Seq handily beat BioJulia. Oh no! As authors of BioJulia, Sabrina Ward and myself has to look into this, and indeed, we were able to write a faster implementation in Julia of the microbenchmark and beat Seq. Hooray! Job done?
No! The Seq authors were fundamentally right that idiomatic Seq beat idiomatic BioJulia. The fact that it was possible for us to beat Seq using Julia was beside the point. Fact is, we believed that BioJulia was fast, and someone showed us that it wasn’t.
We eventually subjected BioSequences.jl to a performance review after the challenge, which we wouldn’t have done if it wasn’t for the Seq microbenchmark. You can read more about this in a BioJulia blogpost
In 2020, famous bioinformatician Heng Li wrote a blog post about fast, high level languages, and included a microbenchmark where Julia featured, and, as usual, was beaten by other compiled languages.
One of the benchmarks consists of parsing a 1.1 GB FASTQ file, where FASTX.jl’s FASTQ parser got absolutely destroyed by Rust’s
needletail. Unbelievably, Rust manages to parse the file in 600 ms(!), whereas FASTX takes 10.6 seconds (2.35s when disregarding compilation). That’s nearly 20 times longer!
Again, I was shocked - I thought FASTX had a really fast parser - I thought we would be hard to beat. How is it possible to be four times faster, even after compiling?
That rabbit hole took me deep into the implementation of how to write fast parsers. I eventually ended up being the maintainer of FASTX.jl and Automa.jl, and rewriting large parts of those packages.
Here too, the benchmark gave me a rude awakening: Two years later, after excruciating optimization and months of work in my free time, FASTX still takes 2.2 seconds to parse the file, although latency improvements in Julia means the total time has been reduced from 10.6 to 3.2 seconds.
Digging deep into the microbenchmarks revealed that it was possible to close a large part of the gap between Rust and Julia by writing a kernel in x86 assembly, but that Julia unfortunately does not provide a reliable way to write platform-specific code, whereas Rust does.
There are more examples of these kinds of lessons, of course.
One time, I found my Julia implementation got trounced by Rust in a simple Advent of Code microbenchmark, because
SubArray remarkably didn’t have an efficient iterate method.
Another time, someone (I can’t remember who) found a microbenchmark where Julia decompressed gzipped files slower than other languages. It turned out that
Zlib_jll was compiled with the wrong flags.
You get the idea.
Regrettably, when I read threads on microbenchmarks on this forum, I mostly just see people trying to be Julia to be number one. Rarely do people ask: Why isn’t Julia already number one in the benchmark? Why is it that whenever outsiders to Julia write a microbenchmark, Julia usually ends up near the middle or the bottom?
Next time you see Julia perform not-so-well in a microbenchmark, I encourage you to compare the Julia implementation to one of the fastest in the benchmark. If the implementations are not significantly different, see if you can track down where the inefficiency of Julia comes from, and if this can be addressed in Julia itself.
It’s the same optimisation challenge, but the stakes are higher. Now, you don’t just get to make a PR to put Julia on the top of some list in a guy’s repo, you get to actually make Julia, the language, faster. Who knows - maybe your efforts will be what makes some functionality in Base run lightning fast, to the benefit of thousands of other Julia users.