Fastest way to convert an MVector/NTuple to a different sized SVector/NTuple?

I have tried a number of things to convert an MVector to an SVector and they all result in various numbers of allocations and/or poor performance. My understanding is that an MVector is just a tuple wrapped in a struct. So… theoretically, couldn’t it run as fast as the same operation for a tuple?

Here’s an example:

function shorten_it(v, len)
	return first(v, len) # v[1:len]
end
function test_tup()
	len = rand(4:5)
	v = (1:6...,)
	@btime shorten_it($v, $len)
end
test_tup()
# 31.156 ns (1 allocation: 96 bytes)

function shorten_it_mv(v, len)
	return SVector{len}(first(v.data, len))
end
function test_mv()
	len = rand(4:5)
	v = MVector(1:6...)
	@btime shorten_it_mv($v, $len)
end
test_mv()
# 1.280 μs (11 allocations: 656 bytes)

Why is the MVector case slower?

I tried StaticArrays.sacollect which was much slower. I understand that whatever might use this type of function could probably be rewritten to use sacollect directly. That can get messy sometimes, and so I’d like to set that aside for this question.

I’m assuming it’s impossible to do this without at least 1 allocation because you can’t return variable length stack data from a method (stack pointer won’t know how much to move).

A case I have run into repeatedly: I know a relatively short max length of an array (maybe 8 or 16), but it is of variable length. For some cases, I suppose views might be the right answer (and that could get to 0 allocations). But for others, I’m guessing copying into an SVector/Tuple would be better overall.

You’re describing an SVector. I don’t know how MVectors are implemented, but they are mutable, and not guaranteed to be stack-allocated, normally probably not.

Using dynamically sized StaticArrays sort of violate the point of them. But perhaps you can use function barriers to isolate the type instability.

1 Like

It appears that both SArray and MArray are backed by a tuple. MArray uses unsafe_store! for mutation.

I understand it’s not ideal, but in some cases it just happens, so I’m trying to find the right way to deal with them. As for function barriers, yes, that would be what to do when used. Right now, I’m just looking at understanding the best way to instantiate them.

I understand this is inherently type unstable and likely to heap allocate, but I would think there should be a way to do it that only allocates once (for the result), just like in the tuple case.

The fastest (not type stable) way I found to create an SVector where the length depends on something evaluated in that exact function was, to pass it to another function with the length being a type parameter, e.g. by using the Val type. For StaticArrays there is the SOneTo type which has the length as type parameter and there is a constructor already defined when used as an index to a SVector. This way, one gets around using type parameters directly.

function shorten_it_mv(v, len)
	return SVector{len}(first(v.data, len))
end


function test_mv()
	len = rand(4:5)
	v = MVector(1:6...)
    shorten_it_mv(v, len)
end



function shorten_it_mv(v,::Val{L}) where L
    return SVector{L}(@view v[1:L])
end

function test_mv2()
	len = rand(4:5)
    v = MVector(1:6...)
	shorten_it_mv(v,Val{len}())
end



function shorten_it_mv(v,ind::SOneTo)
    return SVector(v[ind])
end

function test_mv3()
	len = rand(4:5)
    v = MVector(1:6...)
    nv=shorten_it_mv(v,SOneTo(len))
end

@btime test_mv();
# 880.188 ns (12 allocations: 720 bytes)

@btime test_mv2();
# 169.328 ns (2 allocations: 112 bytes)

@btime test_mv3();
# 176.855 ns (2 allocations: 112 bytes)

Note that I put the timing outside the function, as the instantiation of the parametric types takes the time here, not the creation of the SVector itself.

@btime SOneTo($5)
  #137.082 ns (0 allocations: 0 bytes)
@btime Val{$5}()
  #136.170 ns (0 allocations: 0 bytes)

But I guess it is still better to avoid type-unstable code if possible.

2 Likes

I’m pretty sure I found the fastest solution, but first: I did find something that seems faster than what I originally had. This runs in about 127 ns:

    SVector{N2,T}(view(mv, 1:N2))
    # StaticArrays.sacollect(SVector{N2,T}, mv[i] for i in 1:N2) # Same timing as above
    # SVector(mv.data[1:N2]) # Same timing as above
    # SVector{N2,T}(mv[1:N2]) # Little slower and 2 allocations instead of 1
    # SVector{N2,T}(mv[1:N2]) # Little more slower and 2 allocations instead of 1
end

function test7()
    mv1 = MVector(1,2,3,4,5)
    len = rand(3:5)
    x = @btime tosvector($mv1, Val($len))
    return x
end

(Aha, @mapclyps just posted the above while I was typing this)

But the champion is as boring as you can imagine. An if/else chain is lightning fast. It runs in about 9.3 ns.

function ifsv(v, len)
    if len == 1
        SVector(v[1])
    elseif len == 2
        SVector(v[1], v[2])
    elseif len == 3
        SVector(v[1], v[2], v[3])
    elseif len == 4
        SVector(v[1], v[2], v[3], v[4])
    elseif len == 5
        SVector(v[1], v[2], v[3], v[4], v[5])
    elseif len == 6
        SVector(v[1], v[2], v[3], v[4], v[5], v[6])
    elseif len == 7
        SVector(v[1], v[2], v[3], v[4], v[5], v[6], v[7])
    elseif len == 8
        SVector(v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8])
    elseif len == 9
        SVector(v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8], v[9])
    elseif len == 10
        SVector(v[1], v[2], v[3], v[4], v[5], v[6], v[7], v[8], v[9], v[10])
    end
end

function test9()
    mv1 = MVector(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
    len = rand(3:5)
    @btime ifsv($mv1, $len)
end

Note: Using first(v, #) in the above instead of expanding the args as I did above is slower.

The if chain will work for me in this case because I do have a small max size. I’m sure a macro could be made to reduce the code a little.

If the length of your SVector is not known statically (i.e. at compile time) in a performance-critical function, then you really need to re-think how/whether you should be using this type. You are defeating the whole purpose of StaticArrays.

4 Likes

This is only worth doing if you define the SVector in an outer function where performance doesn’t matter, that will pass through a function barrier to some routine that runs many times in a type stable way.

It’s best to never hoist runtime values to the type domain in critical paths. Imagine you were using Python with C++. This is all Julia but you still need to separate concerns in a similar way.

1 Like

Certainly there is some friction between this question and the purpose of StaticArrays, and hoisting runtime values to the type domain is typically going to be a performance concern.

Just like copying data is not always bad, there may be some cases where this might make sense. For example, maybe there are millions of data that have some short variable length that need to be loaded and processed. They could be read into static arrays and pushed onto a separate vector for each length. Then each vector could be processed separately optimally with type stability. One could argue the processing is the critical path, but sometimes speeding up loading can help. Maybe there is a better overall strategy for this case, just throwing something out there.

Regardless, exploring this question and your responses has helped me deepen my understanding of Julia and expand my toolbox.

Thank you!

In this particular example (but not necessarily as the number of possibilities grows large), you can do reasonably well by ensuring all the possibilities are statically understood by the type system so that union-splitting can occur:

using StaticArrays, BenchmarkTools

function test_tup(v)
	inds = rand((SOneTo(4),SOneTo(5),)) # ::Union{SOneTo{4}, SOneTo{5}}
	return v[inds]
end
v = (1:6...,)
@btime test_tup($v)
# 19.860 ns (0 allocations: 0 bytes)
1 Like