Suppose I want to construct an SVector with known length S and type T, but a loop (with a known number of iterations) would be the most comfortable way of doing it. What is the standard idiom for this in 1.0?
My problem is a bit complex, but for an example, consider
using StaticArrays
function SFib1(::Val{S}) where S
values = Vector{Int}(undef, S)
@inbounds for i in 1:S
values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
SVector{S}(values)
end
Is it possible to allocate the SVector, then manipulate it with reinterpret or similar before returning it?
I am not sure. Should I construct an MVector, then convert to an SVector? It appears to be slower than using a Vector, eg
function SFibM(::Val{S}) where S
values = MVector{S, Int}(undef)
@inbounds for i in 1:S
values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
SVector{S}(values)
end
julia> @btime SFib1(Val(10));
40.800 ns (1 allocation: 160 bytes)
julia> @btime SFibM(Val(10));
65.488 ns (0 allocations: 0 bytes)
function SFibNtuple(::Val{S}) where S
val_i_minus_2 = Ref(1)
val_i_minus_1 = Ref(1)
tup = ntuple(S) do i
if i == 1 || i == 2
1
else
val = val_i_minus_2[] + val_i_minus_1[]
val_i_minus_2[] = val_i_minus_1[]
val_i_minus_1[] = val
end
end
SVector(tup)
end
Setfield.@set is a great tool for manipulating StaticArrays efficiently
using Setfield
using StaticArrays
function SFib1(::Val{S}) where S
values = zeros(SVector{S, Int})
@inbounds for i in 1:S
values = @set values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
values
end
julia> @code_llvm SFib1(Val(4))
; Function SFib1
; Location: REPL[132]:2
define void @julia_SFib1_38655({ [4 x i64] }* noalias nocapture sret) {
L45.3:
; Location: REPL[132]:4
; Function macro expansion; {
; Location: /home/takafumi/.julia/packages/Setfield/X9IQb/src/sugar.jl:77
%1 = bitcast { [4 x i64] }* %0 to <4 x i64>*
store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, <4 x i64>* %1, align 8
ret void
;}
}
Neither a loop nor necessarily a good idea, but here is this:
function raise_(N)
if N == 0
:(args...)
elseif N == 1
:(f(args...))
else
expr = raise_(N-1)
return :(f($expr))
end
end
@generated function raise(f, ::Val{N}, args...) where N
raise_(N)
end
using StaticArrays, BenchmarkTools
fib(_) = (1, 1)
fib(t::NTuple) = (t..., t[end] + t[end-1])
fibs(::Val{N}) where N = SVector(raise(fib, Val(N-1), nothing))
; Function fibs
; Location: /Users/schauermr/.julia/v0.6/Bridge/project/ntuple.jl:20
define void @julia_fibs_38351({ [4 x i64] }* noalias nocapture sret) {
top:
%1 = bitcast { [4 x i64] }* %0 to <4 x i64>*
store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, <4 x i64>* %1, align 8
ret void
}
It seems that SVectors (and SMatrices and SArrays), while being fast, stack-based, fixed-size containers (and indeed my go-to container for short, fixed sized arrays), are complicated to construct if the computation is non trivial. Unless I’m missing something.
julia> using StaticArrays, Accessors
julia> function SFib2(::Val{S}) where S
values = SVector(ntuple(_ -> 0, Val(S)))
@inbounds for i in 1:S
@reset values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
values
end
Accessors.jl is the successor to Setfield.jl (the already accepted answer).
Otherwise, I would use an MVector and convert to SVector at the end.
In v1.13 (but not v1.11, v1.12 I haven’t checked) I have found that if I construct or convert to a MArray initially but convert to a SArray by the end of the function (and don’t allow the MArray to escape) then it will run without heap allocations and quite quickly. This has let me apply functions like LinearAlgebra.lowrankupdate! (which requires mutability) to a SMatrix heap-free and without needing to write a non-mutating version of the function.
So the MVector approach may not be fantastic now but might be soon™.
The Accessors.jl solution doesn’t even have a runtime though, it’s all at compile time here which is nice. Unfortunately Memory is still too opaque to the compiler to be eliminated here.
If that’s the goal, couldn’t it simply be done with @assume_effects?
Base.@assume_effects :total function SFibA(::Val{S}) where S
values = MVector{S, Int}(undef)
@inbounds for i in 1:S
values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
SVector{S}(values)
end
; @ In[13]:32 within `SFibA`
; ┌ @ /usr/local/julia-depot/packages/StaticArrays/LSPcF/src/convert.jl:180 within `StaticArray`
; │┌ @ /usr/local/julia-depot/packages/StaticArraysCore/7xxEJ/src/StaticArraysCore.jl:115 within `SArray`
store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
ret void
; └└
(One can probably reduce :total to something more modest.)
julia> function SFibA(::Val{S}) where S
values = MVector{S, Int}(undef)
@inbounds for i in 1:S
values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
end
SVector{S}(values)
end
SFibA (generic function with 1 method)
julia> @code_llvm debuginfo=:none SFibA(Val(4))
; Function Signature: SFibA(Base.Val{4})
define void @julia_SFibA_5700(ptr noalias nocapture noundef nonnull sret([1 x [4 x i64]]) align 8 dereferenceable(32) %sret_return) #0 {
L61.3:
store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
ret void
}
Looks like it works up to size 12:
julia> @code_llvm debuginfo=:none SFibA(Val(12))
; Function Signature: SFibA(Base.Val{12})
define void @julia_SFibA_5928(ptr noalias nocapture noundef nonnull sret([1 x [12 x i64]]) align 8 dereferenceable(96) %sret_return) #0 {
L61.11:
store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
%"new::SArray.sroa.0.sroa.5.0.sret_return.sroa_idx" = getelementptr inbounds i8, ptr %sret_return, i64 32
store <4 x i64> <i64 5, i64 8, i64 13, i64 21>, ptr %"new::SArray.sroa.0.sroa.5.0.sret_return.sroa_idx", align 8
%"new::SArray.sroa.0.sroa.9.0.sret_return.sroa_idx" = getelementptr inbounds i8, ptr %sret_return, i64 64
store <4 x i64> <i64 34, i64 55, i64 89, i64 144>, ptr %"new::SArray.sroa.0.sroa.9.0.sret_return.sroa_idx", align 8
ret void
}
Oh my bad, I didn’t read the code very carefully I guess, and just assumed you were still converting to an SVector since otherwise it wouldn’t really fit with the original question..
Another compile-time version using only ntuple, no packages:
julia> function fibntuple(::Val{S}) where S
fib_curr::Int = 0
fib_next::Int = 1
return ntuple(Val(S)) do i
@inline
fib_curr, fib_next = fib_next, fib_curr + fib_next
return fib_curr
end
end
fibntuple (generic function with 1 method)
julia> Sfib(::Val{S}) where {S} = SVector(fibntuple(Val(S)))
Sfib (generic function with 1 method)
julia> @btime Sfib(Val(32));
0.584 ns (0 allocations: 0 bytes)
julia> @code_llvm debuginfo=:none Sfib(Val(32));
define void @julia_Sfib_1137([1 x [32 x i64]]* noalias nocapture noundef nonnull sret([1 x [32 x i64]]) align 8 dereferenceable(256) %0) #0 {
top:
%1 = bitcast [1 x [32 x i64]]* %0 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(256) %1, i8* noundef nonnull align 8 dereferenceable(256) bitcast ([1 x [32 x i64]]* @_j_const1 to i8*), i64 256, i1 false)
ret void
}
Note that, in contrast to Constructing SVector with a loop - #15 by Oscar_Smith, this doesn’t even have the SIMD instructions to populate the vector with hardcoded scalars. It’s just directly returning a stored constant. (In other words, there’s two levels of “moving the computation to compile time” here; one is computing all the values at compile time but assembling them into a vector at runtime, which I’ll call SIMD-style; another is doing nothing at all at runtime.) This works up to S = 32, beyond which fibntuple falls back SIMD-style and is no longer inlined into Sfib.
This behavior is quite particular, though. If you try to combine fibntuple and Sfib into a single function, and/or use Refs instead of the typed (but boxed) locals, such as in Constructing SVector with a loop - #5 by tkoolen, you get SIMD-style LLVM again.