Efficiently selecting parts of a StaticArray

Is there a convenient and efficient way of slicing StaticArrays, or let’s say, dropping parts of an SArray?

julia> vs = SVector{4}(rand(4))
4-element SVector{4,Float64}:
 0.580171
 0.719607
 0.651051
 0.430415

julia> vs[1:3]
3-element Array{Float64,1}:
 0.580171
 0.719607
 0.651051

Slicing like this works, but returns an ordinary Array and isn’t particularly fast:

julia> @btime $vs[1:3]
  52.049 ns (3 allocations: 192 bytes)
3-element Array{Float64,1}:
 0.580171
 0.719607
 0.651051

julia> @btime SVector($vs[1], $vs[2], $vs[3])
  2.030 ns (0 allocations: 0 bytes)
3-element SVector{3,Float64}:
 0.580171
 0.719607
 0.651051

I get that the compiler cannot figure out the type of vs[1:3], but if I could tell it to “drop the last element of vs” somehow, then it would know to the return type SVector{3}.

I also need to do this for SVector{3}, SMatrix{4,4} and SMatrix{3,3}. I can of course write several methods like this:

droplast(vs::SVector{4}) = SVector{3}(vs[1], vs[2], vs[3])

but it seems like there must be a more general approach, perhaps one that is already in use?

1 Like

Use SVector all the way.

julia> vs = SVector{4}(rand(4))
4-element SVector{4,Float64}:
 0.658262 
 0.502845 
 0.0820214
 0.3283   

julia> idx = SVector(1,2,3)
3-element SVector{3,Int64}:
 1
 2
 3

julia> vs[idx]
3-element SVector{3,Float64}:
 0.658262 
 0.502845 
 0.0820214

julia> @btime $vs[$idx];
  2.335 ns (0 allocations: 0 bytes)
8 Likes

That’s excellent! I’m still missing a step, though. Do you know how I could create an index vector 1:N-1 to be able to drop the last index for a size N vector. I tried this:

function droplast(v::SVector{N, <:Number}) where {N}
    ind = @SVector [i for i in 1:N-1]
    return v[ind]
end

but this looks for N in global scope, so doesn’t work. Interestingly, @SVector fill(1, N-1) and some others behave nicely inside functions, so perhaps adding something like @SVector 1:N-1 could be a feature request to StaticArrays.jl?

For now, perhaps a @generated function would do the trick?

Something like:

@generated function droplast{N,T}(v::SVector{N, T})
           ind = SVector{N-1,Int64}([i for i in 1:N-1]...)    
           :(return v[$ind])
end

works here (on Julia latest 0.7.0-DEV.515)

2 Likes

Great! This works. In fact, it’s possible to simplify a bit:

@generated function droplast{N, T}(v::SVector{N, T})
   ind = SVector{N-1}(1:N-1)
   return :(v[$ind])
end

Or without generated (but a `@pure instead):

julia> Base.@pure pure_sub(a::Int, b::Int) = a - b
pure_sub (generic function with 1 method)

julia> function droplast(v::SVector{N, <:Number}) where {N}
           ind = SVector{pure_sub(N, 1)}(1:N-1)
           return v[ind]
       end
droplast (generic function with 1 method)

julia> @btime droplast($a)
  6.216 ns (0 allocations: 0 bytes)
4-element SVector{4,Float64}:
 0.945968 
 0.0451724
 0.917798 
 0.368511 
2 Likes

Without @generated and without Base.@pure:

julia> droplast(x::Tuple) = argdroplast(x...)
droplast (generic function with 1 method)
julia> argdroplast(x) = ()
argdroplast (generic function with 1 method)
julia> argdroplast(x,y...) = (x,argdroplast(y...)...)
argdroplast (generic function with 2 methods)
julia> droplast(v::SVector) = SVector(droplast(v.data))
droplast (generic function with 2 methods)

Now benchmarking:

julia> @btime droplast2($sv)    # with @generated
  1.705 ns (0 allocations: 0 bytes)
julia> @btime droplast3($sv)    # with Base.@pure
  6.373 ns (0 allocations: 0 bytes)
julia> @btime droplast($sv)     # defined above
  2.022 ns (0 allocations: 0 bytes)

But the numbers can confuse, @code_native shows the same code is produced for the @generated version and the one above.

Added bonus is the droplast for Tuples which could be used elsewhere.

2 Likes

Looping from 2 : 15 (with force inline on the recursive functions):

  Generated   Pure        Recursion
  1.791 ns    4.154 ns    1.790 ns 
  1.792 ns    6.123 ns    2.086 ns 
  2.086 ns    6.068 ns    1.792 ns 
  2.087 ns    6.217 ns    1.793 ns 
  2.087 ns    6.926 ns    2.088 ns 
  2.088 ns    7.097 ns    2.088 ns 
  2.089 ns    7.106 ns    2.090 ns 
  2.386 ns    7.985 ns    2.385 ns 
  2.092 ns    7.750 ns    2.092 ns 
  3.270 ns    8.873 ns    3.270 ns 
  2.387 ns    8.286 ns    2.387 ns 
  3.559 ns    9.753 ns    3.559 ns 
  2.679 ns    8.575 ns    2.690 ns 
  2.688 ns    10.047 ns   8.657 μs <----

Looks like the generated one and tuple is the same speed except for n > 15 when the tuple one gives up. Pure is slower but doesn’t freak out at n = 15.

1 Like

Lots of really interesting answers here. The ideal solution would be if StaticArrays and the @SVector macro could support StepRanges, e.g.

ind = @SVector 2:2:N-1

etc., where N is a type parameter.

Any idea if this is even possible in theory?

If you use N as always meaning the length of the vector and only use literals in the range, it should be possible.

OK. I may raise that as a feature request in StaticArrays.jl. Seems to me like it could be useful.

Actually, it seems like a special case of the comprehension, which does not quite work as wanted yet: https://github.com/JuliaArrays/StaticArrays.jl#svector

Sorry to revive an old topic, but it isn’t clear from the docs if what’s suggested here is still the best approach. Have there been any updates in the last years?

Could use end instead of N ?

I don’t have a solution for everything, but pop(v) drops the last element, and there’s an SOneTo which is a static version of the Base.OneTo range, which can be useful.

2 Likes

EDIT: benchmarks in a lower post indicate that SizedVector rarely offers a performance advantage over SVector and is sometimes worse.

The nicest way to take slices is with SizedArray. Like

julia> using StaticArrays

julia> x = (1:5).^2 # note: this doesn't even need to be a StaticArrays object
5-element Vector{Int64}:
  1
  4
  9
 16
 25

julia> x[SizedVector{3}(1:2:5)] # indexing with SizedArray returns a SArray
3-element SVector{3, Int64} with indices SOneTo(3):
  1
  9
 25

SOneTo can also work, but can only grab the front of an array with stride=1.

1 Like

Though SVector{3}(1:2:5) seems to work equally well. I’ve never actually understood what SizedVector is for. What can it do that SVector can’t?

SizedVector can be large, SVector should not be used for more than 100 elements, compilation becomes very slow otherwise…

I’m pretty sure that operations on SizedVector are still usually unrolled so carry the same high compiler burden at large sizes. The @code_native I looked at below corroborated this. I don’t think they would have much purpose otherwise.

Mostly, I would expect SizedVector to be preferable when a range object is appreciably more compact than a SVector. So I would reach for SVector{3}(1:3) but SizedVector{10}(1:10). It may also be easier for the compiler to reason about inbounds access and other factors. But, by and large, they aren’t so different for most methods.

julia> using BenchmarkTools

julia> x = rand(100);

julia> @btime $x[$(SVector{20}(1:20))];
  14.228 ns (0 allocations: 0 bytes)

julia> @btime $x[$(SizedVector{20}(1:20))];
  17.452 ns (0 allocations: 0 bytes)


julia> suminds(x, inds) = sum(i->x[i], inds);

julia> @btime suminds($x, $(1:20)) # UnitRange
  17.234 ns (0 allocations: 0 bytes)
12.884730020198726

julia> @btime suminds($x, $(SVector{20}(1:20))) # SVector
  16.132 ns (0 allocations: 0 bytes)
12.884730020198727

julia> @btime suminds($x, $(SizedVector{20}(1:20))) # SizedArray
  9.910 ns (0 allocations: 0 bytes)
12.884730020198727


julia> @btime sum($(SVector{100}(1:100))) # I'm shocked how fast this is
  6.000 ns (0 allocations: 0 bytes)
5050

julia> @btime sum($(SizedVector{100}(1:100))) # uses algebraic transformation
  2.400 ns (0 allocations: 0 bytes)
5050

julia> @btime sum($(1:100)) # uses algebraic transformation 
  2.900 ns (0 allocations: 0 bytes)
5050

julia> @btime sum($(SVector{100}(1:100.0))) # uses SIMD
  52.840 ns (0 allocations: 0 bytes)
5050.0

julia> @btime sum($(SizedVector{100}(1:100.0))) # does not use SIMD or a transformation
  171.488 ns (0 allocations: 0 bytes)
5050.0

julia> @btime sum($(1:100.0)) # uses algebraic transformation
  16.533 ns (0 allocations: 0 bytes)
5050.0

Here, we see that a medium-sized SVector was a bit faster for slicing an array than a SizedVector. But the SizedVector was faster for sequential iteration. But SVector could be vectorized more easily.

Although different benchmarks might tell a different story in certain applications, it looks like SArray is probably preferable in most situations even at larger sizes. In some distant future of fantasy, maybe more effort goes into SizedArray (it could be smarter about some things, probably) and it has more situations where it’s useful. For example, @code_native x[SVector{5}(1:5)] and @code_native x[SizedVector{5}(1:5)] both have 5 boundschecks (and the SizedVector error throwing appears more complicated, for some reason I haven’t looked into), even though the SizedVector{N,T,UnitRange{T}} indexing could be done with 2 (for any size).

If nothing else, if it’s wrapping some compact object (like a range) then it takes up notably less memory. Not a big deal when you only have a couple arrays, but maybe when you have a bunch. I’d probably use a Vector{SizedVector} over a Vector{SVector} where the former could be done with compact representations.

Two things mostly. Storing non-isbits objects and adding size information to arbitrary AbstractArrays like views (or ranges). You can view a part of MArray and operate on it freely, it will be fast in most cases.

SizedArrays shouldn’t be larger than 100 elements either because there is almost no unrolling avoidance in StaticArrays.jl. It’s not particularly difficult to prevent some common unrolling cases but in practice it’s even easier to work around the problem by not using static arrays for larger arrays.