Blog post: How to optimise Julia code: A practical guide

Hello all!

I made a new blog post on how to optimise Julia code. Any comments are welcome!


Thank you for this interesting and informative post. FYI, there is a very small typo in the data locality section, where you define the struct

struct FooArray

In context, it should be bs::Vector{UInt16}.

I have a question about data locality principles but I will make a separate thread and @ you rather than clutter up this one.


This is fantastic! Thank you for writing it.

Great to see this topic getting some love. Thanks for your time and energy.

I’m surprised you recommend reviewing algorithms as 3rd on your list. It can easily contain the lowest hanging fruit. If you can easily go from O(N^3) to O(N log N) with a change to one top-level function, making your code type stable might be unnecessary. Of course, YMMV.


I am wondering if you would consider making a copy of your article over to That is an official Julia forum is meant for Julia specific long form content such as this and has built-in SEO for original links and would give it more traction and not to mention be of great benefit to the community. Thanks


I wonder if this editorial choice can be explained by the desire to avoid scope creep. Improving your algorithm is a great way to optimize code in any language, but the subject at hand is how to optimize Julia code. For people like me who have a high-level understanding of algorithms and computational complexity but are just bad at coding, this kind of tutorial is a wonderful resource.


The reason choosing the right algorithms is number 3 is that the first two are more important.

The first one is type stability. I consider this as much an aspect of code quality as about performance. If possible, one should type stabilize your code. And if one isn’t sure if the code is as type stable as it ought to be, one has no business optimising the code.

The second one is profiling. It doesn’t matter if some part of your code is O(n^3), if your function call spends 0.01% runtime there. Making it O(n log(n)) or whatever will make no practical difference, only potentially cause issues. Only optimise what matters.


A good place to start is to look for vectorisation. If you believe the code should vectorise, scan the assembly for the presence of vector instructions, which can be identified in x86 assembly by usually beginning with “vp”.

You only do integer operations?

Also, Cthulhu.jl is excellent. It is much better than @code_warntype, @code_typed, @code_native, and @code_llvm. You are grossly mischaracterizing it IMO.


Though I believe I agree with the spirit of your point, I think this goes a bit far.

Ways I agree:

1.) I’m open to the possibility that for a new comer to Julia, the emphasis should be more on type stability

2.) There are absolutely cases where it makes more sense to start with type stability, but I think that should be judged on a case-by-case basis.

Rather than having a fixed order, I’d argue one should be looking for the lowest hanging fruit and track the effects of said changes empirically. If it would take 5-mintues to switch algorithms, and 60-minutes to improve type stability, why not try the 5-minute fix and see if it helps enough that you don’t have to bother optimizing further? :person_shrugging:

It doesn’t matter if some part of your code is O(n^3), if your function call spends 0.01% runtime there.

Agreed, but sometimes formal profiling of the code would take longer than just switching algorithms and seeing if it helps or not.

Great post!

As an anecdote, I will say that understanding type stability (and consequences of multiple dispatch more generally) was for me the key thing I was missing to make all the pieces fall into place when I was new to the language, and the main difference maker between loving the language and leaving in frustration. At this point it’s in the first week of my “intro to computation for Earth sciences” course I think. So happy to see it emphasized here!

I’ll also echo Chris that Cthulhu.jl is actually not nearly as scary as the name might suggest – I only started using it in the past month or so, but I now default to it pretty much every time over @code_warntype / @code_typed / @code_llvm / @code_native


When debugging performance issues in Makie I often have the problem that there are many type instabilities but those can neither be removed nor do they have to matter for performance. But they do make using tools like JET or Cthulhu harder because at every dynamic dispatch they give up. I’d actually need a dynamic debugger in conjunction with these tools, but so far the debuggers we have were either much too slow or crashed or were difficult to understand / erratic in the way they jumped around places in the code when stepping.


I am not sure about this — with tooling like

I find profiling very convenient.

In any case, I agree both with you and @jakobnissen to some extent: algorithmic improvements are great if you can obtain them, but that’s not always possible and sometimes requires a bit if creativity. And, of course, it is difficult to write a concise general guide about doing this.

OTOH, fixing up type stability problems and profiling is a reasonably mechanical process that is worth learning about.

A small comment about the post: I find the built in memory-allocation profiling impractical in Julia, and always end up resorting to allocation analysis with


Dito, but I haven’t tried the new memory profiler yet. Have you?

1 Like

What’s the new profiler?

1 Like

It’s in Julia 1.8


TIL, awesome! Sounds like a big upgrade over the old version that wrote a zillion files you had to find and read and clean up.

1 Like

My understanding is it was an awesome contribution by RelationalAI: Allocation profiler by vilterp · Pull Request #42768 · JuliaLang/julia · GitHub. They’ve also got other really interesting PRs like Heap snapshot by vilterp · Pull Request #42286 · JuliaLang/julia · GitHub.


Woah, awesome! Hope that gets merged soon too!

May I ask why in the Use multiple threads section of the blog post, the package Folds.jl and the broader JuliaFolds ecosystem are recommended, but not also the popular LoopVectorization.jl package and the JuliaSIMD organization? Thank you, from a newbie.

1 Like

Huh, I hadn’t noticed that at first. I can certainly recommend LoopVectorization.jl from my own experience! I guess at first LoopVectorization was only singlthreaded, but @tturbo (which multithreads via Polyester.jl) is IMHO one of the easiest ways to get really performant multithreading on (reorderable) loops.