Hello everyone
I came across a set of tips for optimising C++ raytracing code written by Chris Wyman and wondered how much of it applies to Julia. In total there are 27 tips which I’ll share below along with my thoughts on them. What I would like to gain from this post is to ask to fill in the numerous gaps in my knowledge about many of the optimisations as they relate to lower level or core components of the Julia language or computer architecture in general.
My ultimate goal is to make a blog/Forem post about some more subtle, uncommon, or counterintuitive (coming from other languages) optimisations that one can make to Julia code. The bonus goal of this post is to compare how these optimisations apply to Python. A lot of newer programmers and college students (myself included) love to make code go fast and I think it would be fun way to introduce them to Julia: favourably comparing something they are familiar with, namely Python and/or C++, to something that most wouldn’t have considered otherwise.
The original pdf can be found here but I will summarise (sometimes copying sentences directly, obviously I wouldn’t do that for the article) the points without further ado:
Some general rules that apply to all languages:
-
Remember Ahmdal’s Law: \text{speedup} := \frac{\text{time}_\text{old}}{\text{time}_\text{new}} = \frac{1}{(1-\text{func}_\text{cost}) + \text{func}_\text{cost}/\text{func}_\text{speedup}}
- Clearly this still applies in Julia as it’s just a good thing to keep in mind before foolishly trying to optimise a piece of code that is rarely called.
- “Make the common case fast and the rare case correct”.
-
Code for correctness first, then optimize!
- Write for correctness, then if you know the function will be called frequently perform obvious optimisations then find and remove bottlenecks. (paraphrasing the original)
- People I know who write very efficient code say they spend at least twice as long optimizing code as they spend writing code.
Some more specific tips:
-
Jumps/branches are expensive. Minimize their use whenever possible.
- In C++ function calls require two jumps and some stack memory manipulation so Chris advises to use iteration instead of recursion as well as inlining short functions.
- One comment that immediately struck me as potentially suboptimal in Julia is the advice that
for(i=0; i<100; i++) DoSomething();
should be changed into
DoSomething(){for(i=0; i<100; i++){ ... }}
.
In Julia, I would prefer the first as dispatch happens at the function barrier, and so you could run into some suboptimal code generation or runtime slowdown doing the second way if you weren’t actively thinking about the type of variables in the loop. - Another piece of advice is the use of switch statements to replace
if...elseif...elseif
chains as these also require jumps, whereas a switch can be optimised into a table lookup with a single jump. - What I’m not sure about is how many jumps does a function call have in Julia? Is iteration still preferred (for performance) over recursion when you keep everything type stable? Does
@inline
behave exactly the same as in C++? What about the comment on switch statements? I’ve seen chains of ternary operators recommended before and I think it looks pretty, but is it optimal?
-
Think about the order of array indices.
- Here’s one that almost everyone using Julia knows (and for good reason). Obviously, the advice for C++ is the opposite to that of Julia as C++ is row-major.
-
Think about instruction-level-parallelism.
- Modern CPUs can simultaneously execute a bunch of instructions so we should encourage parallelism by making blocks of code (between jumps) have enough independent instructions to allow the CPU to be fully utilized.
- One suggestion Chris has is to think about loop unrolling, but I’ve seen this discussed before in reference to Julia and I’m pretty sure it doesn’t make such a huge difference given how unroll-happy and SIMD-aggressive Clang/LLVM is. We have higher level tools like
@turbo
to do a lot of this kind of stuff but I don’t know enough about it to say whether there’s ever a use case for manual unrolling.
-
Avoid/reduce the number of local variables.
- In C++, local variables are stored on the stack unless there are few enough that they can be stored in registers which speeds up memory accesses and avoids overhead of setting up a stack frame.
- The advice given encourages the use of global variables to avoid using too many local ones, which appears to go directly goes against the first rule of Julian law. However, if we are already inside a locally isolated scope, is there reason to avoid allocating local variables in - for example - a function defined in this scope?
-
Reduce the number of function parameters
- Paraphrasing: “Same reason as above, these are also stored on the stack.”
(From here on out my knowledge gets really spotty as I’ve never really learned about memory or used C/C++ etc.)
-
Pass structures by reference, not by value
- Here’s where we definitely reach the end of my low-ish level knowledge. Pointers/
Ref
s definitely don’t work the same way in C++ and Julia so I’d appreciate as much help on this one as possible.
- Here’s where we definitely reach the end of my low-ish level knowledge. Pointers/
-
If you do not need a return value from a function, do not define one
- Definitely not the Julian way: functions are everywhere because of dispatch and many exist purely to mutate state of pre-existing variables.
- If there’s any deeper truth to this that I’m missing or some additional context that would be useful please let me know!
-
Try to avoid casting where possible
- “Integer and floating point instructions often operate on different registers, so a cast requires a copy.”
- “Shorter integer types (char and short) still require the use of a full-sized register, and they need to be padded to 32/64-bits and then converted back to the smaller size before storing back in memory. (However, this cost must be weighed against the additional memory cost of a larger data type.)”
- This is incredibly low level but not fundamentally language specific. Still applies to Julia but probably not worth thinking about as the benefit of dispatching on the best type outweighs the cost of a copy.
-
Be very careful when declaring C++ object variables.
- “Use initialization instead of assignment (Color c(black); is faster than Color c; c = black;).”
- I’ve never seen someone do this in Julia. Possibly because it’s dumb but certainly in part because structs are immutable by default so initialisation in this way is not even an option.
-
Make default class constructors as lightweight as possible
- Applies directly to Julia as well. Make commonly used initialisation as lightweight as possible.
-
Use shift operations instead of integer multiplication and division, where possible.
- It probably makes more sense to tell the compiler what you want to do and let it optimise the code for the specific architecture.
- Benchmarking I find that
2 << 3
is 11% faster than2 ^ 4
.
-
Be careful using table-lookup functions
- Chris states that, while precomputed value lookups are encouraged by many, the memory lookup is often more expensive than just recomputing the value.
- This is just a general piece of programming advice but applies more to fast languages than slower ones.
-
For most classes, use the operators
+=
,-=
,*=
, and/=
, instead of the operators+
,-
,*
, and/
.- Certainly applies to mutable structs and structs that contain mutable objects, however sometimes things that live entirely on the stack can be optimised away and are never initialised in the first place. If someone has any more in depth or specific knowledge in this area I’d appreciate it!
-
For basic data types, use the operators
+
,-
,*
, and/
instead of the operators+=
,-=
,*=
, and/=
.- Using
+=
assumes mutability which is generally something to be avoided unless you’re working with large arrays where it doesn’t matter much.
- Using
-
Delay declaring local variables
- Chris advises that variables should only be declared on branches where they are necessary, but I’m sceptical about this line specifically. I’m not sure if branches are handled by LLVM or only the machine itself, but am I correct in saying that the more similar branches are, the more likely the computer is to be able to evaluate both and not have to wait to see which branch is reached? I probably just said something dumb but please educate me!
-
For objects, use the prefix operator (++obj) instead of the postfix operator (obj++).
- We don’t have these in Julia so no problemo. The justification Chris gives for C++ is that copies of objects are made (and destroyed) when calling a postfix, but prefix operators don’t need a temporary copy.
-
Be careful using templates
- In Julia, users are encouraged to use packages for what they are trying to do rather than implement it themselves (finite difference methods for differentiation, for example) and then raise any issues with the packages when they discover something suboptimal. This is in part because most Julia packages are written entirely in Julia and I absolutely love this.
- Unfortunately, most Julia packages are written by academics with specific use-cases. Specifically, this means that code is sometimes hard to read even though it’s written in a high-level language. This has been discussed to death so I probably won’t write much about it for the article.
- Focussing on the positives, Julia makes it easy to read source code to understand exactly what a certain function is doing and lets you easily extend the language itself to optimise (almost) anything to your use case.
-
Avoid dynamic memory allocation during computation.
- This is another commonly given piece of advice in Julia. Prefer immutability over mutability unless it is unavoidable, in which case embrace it entirely and do as much as possible in-place.
- The fact that stack allocations can sometimes disappear entirely is even more reason to avoid dynamic memory allocations in Julia.
-
Find and utilize information about your system’s memory cache
- “If a data structure fits in a single cache line, only a single fetch from main memory is required to process the entire class”
- This is really cool stuff but I’ve never thought about it in Julia. How lightweight are structs in Julia? That is to say, how much overhead is there when allocating a struct compared to allocating its contents individually? Can I reliably predict how many bytes a struct will be?
-
Avoid unnecessary data initialization.
- “If you must initialize a large chunk of memory, consider using
memset()
.” - From what I know,
memset
is a fundamental part of C++ optimisation, but Julians don’t need to worry about it. - If anything, preallocating large arrays is one of the most Julian things to do and is something that even MATLAB users have in their optimisation toolkit.
- “If you must initialize a large chunk of memory, consider using
-
Try to early loop termination and early function returns.
- It was funny to see this sort of simple stuff so late in the list after all of the hardcore memory and low-level tips.
- Eliminating common cases first to avoid computation is just general good practice, especially in the raytracing context that the original list was written where most rays will miss most objects.
-
Simplify your equations on paper!
- In Julia, we can do one better: Simplify your equations with ModelingToolkit.jl!
-
The difference between math on integers, fixed points, 32-bit floats, and 64-bit doubles is not as big as you might think.
- I’ve seen many applications where using 32-bit floats has resulted in huge speedups without loss of precision. In fact, it’s the standard in some domains (deep learning).
- What about SIMD? Even if the original operations aren’t that much faster, the fact that we can do more of them is surely a bonus. Given that Julia (and everything compiled by Clang) heavily relies on SIMD code the benefit should be noticeable or at least tested for.
-
Consider ways of rephrasing your math to eliminate expensive operations.
- Again, this is general advice, but the difference between cheap and expensive operations on modern hardware isn’t that big so this could result in code being less readable for a minimal benefit.
And that’s everything. I wish I had more to say about many of these, but I hope that with the community’s advice I can write something insightful about most of these comments, and hopefully give people looking to scratch their optimisation itch while using a high-level language more of a reason to use Julia!
As I’m sure you can tell, my knowledge of the inner workings of Julia, C++, and the very computers that I use every day are limited so I’m definitely not the best, nor most qualified person to write this article. That said, the more content there is about my favourite language the better!