I have the following code that I’m playing around with (taken from here):
function vectorized()
a = [1.0, 1.0]
b = [2.0, 2.0]
x = [0, 0]
for i in 1:1000
x = a + b
end
return x
end
@btime vectorized()
As expected, this does 1003 allocations.
However if I change: x = a + b to x += a + b the number of allocations doubles to 2003.
Does anyone know why?
As a bit of a side note: From my understanding Julia uses LLVM to compile functions at run time. Given that is the case, I’m surprised LLVM is not doing an optimization like recognizing a + b is always the same & then only allocating a + b once. More generally, I’m surprised LLVM is not optimizing the above code into the devectorized version.
Side Note 2: Is there a way I can debug stuff like this on my own? I.e, is there something that will spit out the underlying IR that is being generated?
a + b allocates a new array per iteration. x + (a + b) also allocates a new array per iteration.
If you’d like to avoid the allocations, @. x = a + b or @. x += a + b are simple approaches that work.
Much of Julia’s memory allocations are a black box to LLVM, so it will/can not optimize them.
To see IR, I recommend Cthulhu. @descend lets you quickly switch between representations, such as typed Julia IR, LLVM IR, or asm. It also lets you descend into functions called from there to explore them as well.
Without a dependency, look at @code_warntype, @code_typed, @code_llvm, and @code_native. These don’t let you descend, so they’re overall less convenient.
StaticArrays is a popular library providing compile-time sized arrays. The SArray and MArray types are both much less opaque to the compiler, letting them be optimized much more aggressively.
@Elrod Thanks for the quick response, and I appreciate the link to Cthulhu.
One area where I sense a lack of my understanding is the purpose of LLVM then. Presumably the point of LLVM is to be able to do optimizations like constant folding in order to speed up generated code?
The Julia side of the compiler also performs some optimizations like constant folding.
LLVM can/does perform many more though. It also performs constant prop, instruction selection, vectorization, various peep hole optimizations, the JIT/creating the machine code actually being run…