Why are these nearly identical function not compiled into the same llvm ir?
using StaticArrays
using BenchmarkTools
function funcS()
ret = SVector(0,0)
ret = ret + SVector(123, 321)
return ret
end
function funcM()
ret = MVector(0,0)
ret += MVector(123, 321)
return ret
end
@benchmark funcS()
@benchmark funcM()
I don’t know C so I can’t answer that. MVectors are not stack allocated because they are mutable, and mutable data structures only get stack allocated when Julia’s optimizer can see the entire lifetime of the data, so it can’t be allowed to escape the function body (modulo inlining).
The usual trick people use is to convert a SVector to MVector, mutate it, then convert back to SVector. If the compiler is in a good mood, this should be stack allocated.
Here’s an example:
using StaticArrays
using BenchmarkTools
function guncS(v)
SVector{2}(v[1], v[2] + 1)
end
function guncM(v)
mv = MVector(v)
mv[2] += 1 # Mutation on the stack!
SVector(mv)
end
let v = Ref(SVector(1, 2))
@btime guncS($v[])
@btime guncM($v[])
end;
#+RESULTS:
: 1.329 ns (0 allocations: 0 bytes)
: 1.329 ns (0 allocations: 0 bytes)
Note that I used Ref here to stop the function call from being hoisted out of the benchmark loop. Your funcS never actually ran because the compiler realized it was a compile time constnat. (Your CPU can’t do anything faster than a nanosecond)
To expand on this: semantically julia is designed so that everything acts as if it were heap allocated. Stack allocation is an optimization, and the compiler is designed to be able to not do certain optimizations if it decides so.
There’s a lot of thought and discussion right now going into questions like “how can we get better guarantees that the compiler will do certain optimizations we want”. Currently, the answer really just is that you can’t guarantee it, and all of these things are allowed to change between versions. This can be a blessing and a curse.
In practice, I think this really isn’t much of a problem for anyone but people with real-time requirements, and even they can deal with it if they’re careful enough, but Julia is not really an ideal language for such systems (yet).
Long term, Keno has some revolutionary work planned on that front that will apply to regular arrays.
I’m not sure what the time horizon for that is though, e.g. he’s busy revolutionizing AD at the moment.
One of the problems in Julia compared to something like Fortran that makes this harder is that our optimization units are smaller.
If you’re compiling a huge Fortran program at once, the compiler can see that sure an array escapes the function that created it, passing it to another function, which it escapes from, which takes a slice that escapes, which…but then that one doesn’t escape, so it’s safe to stack allocate everything!
To add to the StrideArrays documentation, the unit tests have another example:
julia> using StrideArrays, Test
julia> @noinline function dostuff(A; B = I)
StrideArrays.@avx for i in eachindex(A)
A[i] += 1
end
C = A * B
s = zero(eltype(C))
StrideArrays.@avx for i in eachindex(C)
s += C[i]
end
s
end
dostuff (generic function with 1 method)
julia> function gc_preserve_test()
A = @StrideArray rand(8,8);
B = @StrideArray rand(8,8);
@gc_preserve dostuff(A, B = B)
end
gc_preserve_test (generic function with 1 method)
julia> @test iszero(@allocated gc_preserve_test())
Test Passed
julia> gc_preserve_test()
374.2746212074998
We create two random matrices, pass them to a function that isn’t inlined, mutate one of them, multiply them, and then sum the result – all without allocations.
The key is the @gc_preserve macro (which could be made a lot smarter than it is now; currently it only applies to a single function call), which basically just replaces all the arrays in the function call with PtrArrays – pointers + some more information like size and strides.
This stops the arrays from escaping – they don’t get passed to the function – so they can be stack allocated. The macro uses GC.@preserve to make sure that the arrays are nonetheless protected in the meantime, so it’s still GC-safe.
You can think of the macro as like a promise that the arrays don’t escape. If you do something like assign them to a global or push! them into a vector, you’ll get garbage.