Constructing SVector with a loop

Suppose I want to construct an SVector with known length S and type T, but a loop (with a known number of iterations) would be the most comfortable way of doing it. What is the standard idiom for this in 1.0?

My problem is a bit complex, but for an example, consider

using StaticArrays

function SFib1(::Val{S}) where S
    values = Vector{Int}(undef, S)
    @inbounds for i in 1:S
        values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
    end
    SVector{S}(values)
end

Is it possible to allocate the SVector, then manipulate it with reinterpret or similar before returning it?

1 Like

It looks like you’re looking for MVector?

I am not sure. Should I construct an MVector, then convert to an SVector? It appears to be slower than using a Vector, eg

function SFibM(::Val{S}) where S
    values = MVector{S, Int}(undef)
    @inbounds for i in 1:S
        values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
    end
    SVector{S}(values)
end

julia> @btime SFib1(Val(10));
  40.800 ns (1 allocation: 160 bytes)

julia> @btime SFibM(Val(10));
  65.488 ns (0 allocations: 0 bytes)

It seems the crossover point is 7.

julia> @btime SFibM(Val(7))
  31.316 ns (0 allocations: 0 bytes)

julia> @btime SFib1(Val(7))
  48.441 ns (1 allocation: 144 bytes)

without the additional conversion, I get:

julia> @btime SFibM2(Val(7)) 
  10.438 ns (1 allocation: 64 bytes)

Probably not a valid solution, but fun anyway:

function SFibNtuple(::Val{S}) where S
    val_i_minus_2 = Ref(1)
    val_i_minus_1 = Ref(1)
    tup = ntuple(S) do i
        if i == 1 || i == 2
            1
        else
            val = val_i_minus_2[] + val_i_minus_1[]
            val_i_minus_2[] = val_i_minus_1[]
            val_i_minus_1[] = val
        end
    end
    SVector(tup)
end

This infers up to Val(10).

using BenchmarkTools
@btime SFibNtuple(Val(10));

prints 1.702 ns (0 allocations: 0 bytes).

1 Like

You could use a generated function to unroll the loop.

Setfield.@set is a great tool for manipulating StaticArrays efficiently

using Setfield
using StaticArrays

function SFib1(::Val{S}) where S
    values = zeros(SVector{S, Int})
    @inbounds for i in 1:S
        values = @set values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
    end
    values
end
julia> @code_llvm SFib1(Val(4))

; Function SFib1
; Location: REPL[132]:2
define void @julia_SFib1_38655({ [4 x i64] }* noalias nocapture sret) {
L45.3:
; Location: REPL[132]:4
; Function macro expansion; {
; Location: /home/takafumi/.julia/packages/Setfield/X9IQb/src/sugar.jl:77
  %1 = bitcast { [4 x i64] }* %0 to <4 x i64>*
  store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, <4 x i64>* %1, align 8
  ret void
;}
}
6 Likes

Neither a loop nor necessarily a good idea, but here is this:

function raise_(N) 
    if N == 0 
        :(args...)
    elseif N == 1   
        :(f(args...)) 
    else
        expr = raise_(N-1)
        return :(f($expr))
    end
end

@generated function raise(f, ::Val{N}, args...) where N
    raise_(N)
end

using StaticArrays, BenchmarkTools
fib(_) = (1, 1)
fib(t::NTuple) = (t..., t[end] + t[end-1])
fibs(::Val{N}) where N = SVector(raise(fib, Val(N-1), nothing))
; Function fibs
; Location: /Users/schauermr/.julia/v0.6/Bridge/project/ntuple.jl:20
define void @julia_fibs_38351({ [4 x i64] }* noalias nocapture sret) {
top:
  %1 = bitcast { [4 x i64] }* %0 to <4 x i64>*
  store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, <4 x i64>* %1, align 8
  ret void
}

Is there a better solution to this, now?

It seems that SVectors (and SMatrices and SArrays), while being fast, stack-based, fixed-size containers (and indeed my go-to container for short, fixed sized arrays), are complicated to construct if the computation is non trivial. Unless I’m missing something.

I would use Accessors.jl for it:

julia> using StaticArrays, Accessors

julia> function SFib2(::Val{S}) where S
           values = SVector(ntuple(_ -> 0, Val(S)))
           @inbounds for i in 1:S
               @reset values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
           end
           values
       end

Accessors.jl is the successor to Setfield.jl (the already accepted answer).

Otherwise, I would use an MVector and convert to SVector at the end.

2 Likes

For Julia 1.11+, you could try a Memory:

julia> function memfib(::Val{S}) where S
    values = Memory{Int}(undef, S)
    @inbounds for i in 1:S
        values[i] = ((i == 1) | (i == 2)) ? 1 : values[i-1] + values[i-2]
    end
    values
end

julia> @btime memfib(Val(7))
  4.010 ns (1 allocation: 80 bytes)
7-element Memory{Int64}:
  1
  1
  2
  3
  5
  8
 13

This stays pretty fast too.

2 Likes

In v1.13 (but not v1.11, v1.12 I haven’t checked) I have found that if I construct or convert to a MArray initially but convert to a SArray by the end of the function (and don’t allow the MArray to escape) then it will run without heap allocations and quite quickly. This has let me apply functions like LinearAlgebra.lowrankupdate! (which requires mutability) to a SMatrix heap-free and without needing to write a non-mutating version of the function.

So the MVector approach may not be fantastic now but might be soon™.

2 Likes

The Accessors.jl solution doesn’t even have a runtime though, it’s all at compile time here which is nice. Unfortunately Memory is still too opaque to the compiler to be eliminated here.

2 Likes

If that’s the goal, couldn’t it simply be done with @assume_effects?

Base.@assume_effects :total function SFibA(::Val{S}) where S
    values = MVector{S, Int}(undef)
    @inbounds for i in 1:S
        values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
    end
    SVector{S}(values)
end
;  @ In[13]:32 within `SFibA`
; ┌ @ /usr/local/julia-depot/packages/StaticArrays/LSPcF/src/convert.jl:180 within `StaticArray`
; │┌ @ /usr/local/julia-depot/packages/StaticArraysCore/7xxEJ/src/StaticArraysCore.jl:115 within `SArray`
    store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
    ret void
; └└

(One can probably reduce :total to something more modest.)

On 1.12, no effects are needed:

julia> function SFibA(::Val{S}) where S
           values = MVector{S, Int}(undef)
           @inbounds for i in 1:S
               values[i] = (i == 1 || i == 2) ? 1 : values[i-1] + values[i-2]
           end
           SVector{S}(values)
       end
SFibA (generic function with 1 method)

julia> @code_llvm debuginfo=:none SFibA(Val(4))
; Function Signature: SFibA(Base.Val{4})
define void @julia_SFibA_5700(ptr noalias nocapture noundef nonnull sret([1 x [4 x i64]]) align 8 dereferenceable(32) %sret_return) #0 {
L61.3:
  store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
  ret void
}

Looks like it works up to size 12:

julia> @code_llvm debuginfo=:none SFibA(Val(12))
; Function Signature: SFibA(Base.Val{12})
define void @julia_SFibA_5928(ptr noalias nocapture noundef nonnull sret([1 x [12 x i64]]) align 8 dereferenceable(96) %sret_return) #0 {
L61.11:
  store <4 x i64> <i64 1, i64 1, i64 2, i64 3>, ptr %sret_return, align 8
  %"new::SArray.sroa.0.sroa.5.0.sret_return.sroa_idx" = getelementptr inbounds i8, ptr %sret_return, i64 32
  store <4 x i64> <i64 5, i64 8, i64 13, i64 21>, ptr %"new::SArray.sroa.0.sroa.5.0.sret_return.sroa_idx", align 8
  %"new::SArray.sroa.0.sroa.9.0.sret_return.sroa_idx" = getelementptr inbounds i8, ptr %sret_return, i64 64
  store <4 x i64> <i64 34, i64 55, i64 89, i64 144>, ptr %"new::SArray.sroa.0.sroa.9.0.sret_return.sroa_idx", align 8
  ret void
}

Well, the Memory here will never be eliminated - it is returned after all, just like the Vector or SVector before it :slight_smile:

Oh my bad, I didn’t read the code very carefully I guess, and just assumed you were still converting to an SVector since otherwise it wouldn’t really fit with the original question..

Another compile-time version using only ntuple, no packages:

julia> function fibntuple(::Val{S}) where S
           fib_curr::Int = 0
           fib_next::Int = 1
           return ntuple(Val(S)) do i
               @inline
               fib_curr, fib_next = fib_next, fib_curr + fib_next
               return fib_curr
           end
       end
fibntuple (generic function with 1 method)

julia> Sfib(::Val{S}) where {S} = SVector(fibntuple(Val(S)))
Sfib (generic function with 1 method)

julia> @btime Sfib(Val(32));
  0.584 ns (0 allocations: 0 bytes)

julia> @code_llvm debuginfo=:none Sfib(Val(32));
define void @julia_Sfib_1137([1 x [32 x i64]]* noalias nocapture noundef nonnull sret([1 x [32 x i64]]) align 8 dereferenceable(256) %0) #0 {
top:
  %1 = bitcast [1 x [32 x i64]]* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(256) %1, i8* noundef nonnull align 8 dereferenceable(256) bitcast ([1 x [32 x i64]]* @_j_const1 to i8*), i64 256, i1 false)
  ret void
}

Note that, in contrast to Constructing SVector with a loop - #15 by Oscar_Smith, this doesn’t even have the SIMD instructions to populate the vector with hardcoded scalars. It’s just directly returning a stored constant. (In other words, there’s two levels of “moving the computation to compile time” here; one is computing all the values at compile time but assembling them into a vector at runtime, which I’ll call SIMD-style; another is doing nothing at all at runtime.) This works up to S = 32, beyond which fibntuple falls back SIMD-style and is no longer inlined into Sfib.

This behavior is quite particular, though. If you try to combine fibntuple and Sfib into a single function, and/or use Refs instead of the typed (but boxed) locals, such as in Constructing SVector with a loop - #5 by tkoolen, you get SIMD-style LLVM again.