According to the documentation of StaticArrays.jl, mutable static arrays are implemented as mutable structs and are allocated on the heap even though their sizes are known at compile time. Is Julia’s compiler able to detect cases when it’s safe to allocate on the stack instead? (I suppose this must be possible theoretically, but wonder if this feature has been / will be implemented.)
Yes.
julia> using StaticArrays, BenchmarkTools
julia> if VERSION >= v"1.7.0-beta"
@inline exp_fast(x) = Base.Math.exp_impl_fast(x, Val(:ℯ))
else
exp_fast(x) = exp(x)
end
exp_fast (generic function with 1 method)
julia> function alloctest(x)
y = MVector(x)
@inbounds @simd ivdep for i ∈ eachindex(y)
y[i] = Base.Math.exp_impl_fast(y[i], Val(:ℯ))
end
s = zero(eltype(y))
@fastmath for i ∈ eachindex(y)
s += y[i]
end
s
end
alloctest (generic function with 1 method)
julia> x = @SVector rand(32);
julia> @btime alloctest($x)
11.995 ns (0 allocations: 0 bytes)
56.18908775961786
The assembly confirms that y
is in fact stack allocated (loads and stores use rsp
, the 64-bit-mode stack pointer).
Note that in many cases, the MArray
will not be allocated at all, existing only in the CPU’s registers if at all.
For some reasons, it still allocates when the index is not known at compile time.
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
julia> @time wat('a');
0.000004 seconds (1 allocation: 32 bytes)
Huh.
julia> using StaticArrays
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @time wat('a');
0.000000 seconds
julia> @time wat('a');
0.000000 seconds
Maybe you could try @btime
?
This appears to be fixed in Julia 1.9.0-alpha1.
Julia 1.8.0
julia> using StaticArrays, BenchmarkTools
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @time wat('a');
0.000005 seconds (1 allocation: 32 bytes)
julia> @time wat('a');
0.000002 seconds (1 allocation: 32 bytes)
julia> @btime wat('a');
12.813 ns (1 allocation: 32 bytes)
julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab7 (2022-08-17 13:38 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
Threads: 1 on 20 virtual cores
Julia 1.8.3
julia> using StaticArrays, BenchmarkTools
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @time wat('a');
0.000007 seconds (1 allocation: 32 bytes)
julia> @time wat('a');
0.000002 seconds (1 allocation: 32 bytes)
julia> @btime wat('a');
8.208 ns (1 allocation: 32 bytes)
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
Threads: 1 on 20 virtual cores
Julia 1.8.3 w/ -Cskylake option
julia> using StaticArrays, BenchmarkTools
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @time wat('a');
0.000006 seconds (1 allocation: 32 bytes)
julia> @time wat('a');
0.000002 seconds (1 allocation: 32 bytes)
julia> @btime wat('a');
7.800 ns (1 allocation: 32 bytes)
julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-14 20:14 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
Threads: 1 on 20 virtual cores
Julia 1.9.0-alpha1
julia> using StaticArrays, BenchmarkTools
julia> function wat(char)
buf = @MVector [0,0,0]
buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @time wat('a');
0.000002 seconds
julia> @time wat('a');
0.000001 seconds
julia> @btime wat('a');
4.600 ns (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.9.0-alpha1
Commit 0540f9d739 (2022-11-15 14:37 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 20 × 12th Gen Intel(R) Core(TM) i9-12900HK
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
Threads: 1 on 20 virtual cores
Who hasn’t been running https://github.com/JuliaLang/julia/pull/47184 for the past couple months?
It’s been merged to master!
Is there somewhere someone describing the impacts of this PR on the current behavior of Julia ? I have troubles unstanding what it really means, and you guys seem to be thrilled so i wander
the TLDR is that the Julia compilation pipeline has a few steps.
- parsing/lowering
- type inference and julia IR optimization (e.g. inlining)
- LLVM optimizations and code generation.
Prior to https://github.com/JuliaLang/julia/pull/47184 precompiling only saved 1 and 2. This PR makes it so we save step 3 as well which improves responsiveness (sometimes dramatically).
Sorry for jumping in here after a couple months but I discovered something pertinent. The example method indicated won’t allocate even on earlier versions (at least the Julia v1.8.5 I’m running) if you add an @inbounds
to it:
julia> using StaticArrays, BenchmarkTools
julia> function wat(char)
buf = @MVector [0,0,0]
@inbounds buf[char - 'a' + 1] = 1
return 0
end
wat (generic function with 1 method)
julia> @btime wat('a');
3.400 ns (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver3)
Threads: 24 on 24 virtual cores
I discovered this because based on this conversation, I tried the code I was working on in 1.9.0-beta4 and it still allocated. I tracked it down to a loop and added @inbounds and it stopped allocating. Then tried it back in 1.8.5 and it still didn’t.
So, if your MVector
s are allocating, check if some @inbounds
can help.