Fastest way to initialize StaticArray of dynamic size with state

I’m constructing an SVector, but each value depends on the previous calculated value. This is still simple to do with sacollect, but then it has to box the input and so allocates (see example below). What is the optimal approach for this situation?

using StaticArrays
testCalc(i, x) = (i+x, x+1)
function test(::Val{K}, x::Int)::SVector{K} where K
    StaticArrays.sacollect(SVector{K,Int},
        ((i, x) = testCalc(K, x); i) for k in K:-1:1
    )
end
@code_warntype test(Val(4), 1) # output includes x@_5::Union{Int64, Core.Box}

Something like this works (and is quite explicit):

julia> using Setfield
       function test2(::Val{K},x::Int) where K
           y = zero(SVector{K,Int})
           @set! y[1] = K
           for i in 2:K
               @set! y[i] = y[i-1] + 1
           end
           return y
       end
test2 (generic function with 1 method)

julia> @btime test2(Val(4),1)
  3.524 ns (0 allocations: 0 bytes)
4-element SVector{4, Int64} with indices SOneTo(4):
 4
 5
 6
 7

2 Likes

In some cases you can use an MVector, perform your operations, and convert to SVector. If the array doesn’t escape, everything is stack allocated and fast:

function test3(::Val{K},x::Int) where K
    y = MVector{K,Int}(undef)
    y[1] = K
    for i in 2:K
        y[i] = y[i-1] + 1
    end
    return SVector(y)
end

Comparing to the code using Setfield:

1.7.2> @btime test2(Val(4),1)
  3.200 ns (0 allocations: 0 bytes);

1.7.2> @btime test3(Val(4),1);
  2.300 ns (0 allocations: 0 bytes)

The difference grows larger for bigger arrays:

1.7.2> @btime test2(Val(8),1);
  11.011 ns (0 allocations: 0 bytes)

1.7.2> @btime test3(Val(8),1);
  2.600 ns (0 allocations: 0 bytes)

1.7.2> @btime test2(Val(16),1);
  37.538 ns (0 allocations: 0 bytes)

1.7.2> @btime test3(Val(16),1);
  3.100 ns (0 allocations: 0 bytes)

1.7.2> @btime test2(Val(32),1);
  110.645 ns (0 allocations: 0 bytes)

1.7.2> @btime test3(Val(32),1);
  21.486 ns (0 allocations: 0 bytes)

1.7.2> @btime test2(Val(64),1);
  1.300 μs (0 allocations: 0 bytes)

1.7.2> @btime test3(Val(64),1);
  38.547 ns (0 allocations: 0 bytes)
3 Likes

It’s great to see that the compiler can optimize away that allocation. This one seems the best conceptually to me and it’s great to see it also performs the best (out of the ones presented here). As for cases where it wouldn’t work… it seems to me that you should always be able to extract what is inside the loop such that this becomes a util/constructor function:

function makeSV(f, ctx, init, ::Val{K}) where K
    y = MVector{K,Int}(undef)
    next = init
    for i in 1:K
        val, next = f(ctx, next, i)
        y[i] = val
    end
    return SVector(y)
end
function inner(ctx, prev, i)
    (prev, prev+1)
end

makeSV(inner, nothing, 8, Val(8))

Interesting. That appears to run exactly as fast as the specific case @DNF proposed if it’s modified to match (it’s slower if unmodified because it does more):

function testing(::Val{K},x::Int) where K
    y = MVector{K,Int}(undef)
    for i in 1:K
        y[i] = x
        x += 1
    end
    return SVector(y)
end

Thanks for your help! Now I have a very flexible and fast function to create an SVector.