In practice I may want to write this function (only r.x is the object being considered, other parts are less relevant)
function f!(r, I)
for i=I
r.x[i] = rand(Int)
end
end
Assume now the structure r.x is fixed throughout. So a foolproof way to write this would be
function g!(r, I)
x = r.x # introduce an additional local name (I write `x` here)
for i=I
x[i] = rand(Int)
end
end
But g! is lengthier. I wonder if julia’s compiler can directly perform the transformation to g! under the hood so I can just keep writing f! at the front end. Can I?
Test
Here is a simple test. But in practice my r.x may mean a field access from a NamedTuple r as well
function test(N)
r = Ref(rand(Int, N))
I = Base.OneTo(N)
@time g!(r, I)
@time f!(r, I)
@time g!(r, I)
@time f!(r, I)
end
test(9999999)
The pedantic answer is “It depends on what r and I are.” Julia functions have unconstrained polymorphism, so without knowing what r and I are, this function could do literally anything.
That said, a somewhat safe assumption here would be that r is some form of possibly mutable struct containing a field x which has an array in it, and is using generic methods for getproperty and setindex!, and I is some well behaved iterable of integer indices. In this case, the answer is maybe, this is the sort of thing julia’s compiler is often good at hoisting out of loops, but there’s often some catches.
However, you may run into trouble when r is a mutable type, because then it’s really up to the compiler to decide if it is legal to hoist the pointer loading out of the loop. Your benchmark would appear to suggest that no, in this case the compiler decides not to perform the hoisting. You can check the code_llvm output of f! and g! to see for yourself what exactly it decides to do.
Yes, if the field x is mutable, then the compiler cannot do hoisting.
But I found that it appears that I can get performance gain by manually do hoisting even if the x is immutable, e.g. r = (x = [1,2],). I think in practice I have to bother doing manual hoisting
Nobody is stopping another thread from reassigning the x property. However, the compiler does not need to care, and this does not stop the compiler from hoisting the x:
If there is no atomic / acquire fence in the loop, and some other thread happens to reassign the x property… then this is a bad undef/poison race condition.
The compiler is at liberty to instead spawn “nasal bats”, i.e. do almost anything at all. (write goes to the updated x – correct program execution. Write goes to the old x due to hoisting – also correct program execution. Write goes into some internal datastructure, ransomware is downloaded and encrypts your hard-drive – also correct program execution)
The compiler can hoist if it can prove that nothing inside the loop either updates x or is an atomic acquire.
This of course only happens if the loop body is pretty small – the compiler is not that smart. Luckily, this hoisting only matters if the loop body is pretty small: If the load of x can amortize over a long and expensive loop body, then hoisting of the load is less than a rounding error in terms of performance.
The real thing to look out for are loop bodys that are typically small and fast, but contain very rarely executed code for edge-cases that do complicated stuff. For example, a 1 in a billion chance of having to write a debug log message (which interacts with io → needs a lock → is an acquire → no hoisting).
Theoretically LLVM has passes for that kind of thing (put the reload of x into the rare condition). But I found that not very reliable.