Performant creation of vector of SVectors given a known formula



I want to create a vector of SVectors. I have to create all SVectors on the spot, given a formula that I know, and given also the dimensionality I know. The dimensionality can also be type-parameter, there is no problem with that (I already have a type that has it as a type parameter).

The problem is, the formula is accessing the elements of an array one by one, so I cannot reduce it to basic julia functions where SVectors know how to do their thing.

In short, this is what I am doing now:

function reconstruct(s::AbstractVector, D::Int,  τ::Int)
    L = length(s) - (D-1)*τ;
    T = eltype(s)
    data = Vector{SVector{D, T}}(L)
    for i in 1:L
        gen = (s[i+k*τ] for k in 0:(D-1))
        data[i] = SVector{D,T}(gen...)
    V = typeof(s)
    T = eltype(s)
    return data

(once again: both τ and D can be made type-parameters if need be)

I am quite sure that using a generator is not the most performant thing to do. I got a hunch that the most performant version would involve meta-programming and @generated? But unfortunately I do not have any metaprogramming skills whatsoever currently so I am asking for assistance!


If you get used to metaprogramming its actually not that hard :slight_smile:

This would be the generated function (where you have to make D a type parameter)

function reconstruct_impl(::Type{Val{D}}) where D
    gens = [:(s[i + $k*τ]) for k=0:D-1]

        L = length(s) - ($(D-1))*τ;
        T = eltype(s)
        data = Vector{SVector{$D, T}}(L)
        for i in 1:L
            data[i] = SVector{$D,T}($(gens...))
        V = typeof(s)
        T = eltype(s)

@generated function reconstruct(s::AbstractVector, ::Val{D},  τ::Int) where D

You can check the generated code with reconstruct_impl(::Val{4}) and to call reconstruct you would have to do reconstruct(s, Val{4}(), τ).


I’d just add that using the Size trait for propagating sizes of static arrays is a good way to go.

We’d also like it easier to map over a “static” unit range, use generators & comprehensions, and so-on. ntuple(f, Val{D}) is often useful, also.

julia> function reconstruct2(s::AbstractVector{T}, V::Type{SVector{D, T}},  ?::Int) where {T, D}
           L = length(s) - (D-1)*? 
           v = zeros(V, L)
           for i in 1:L
               let i = i
                   v[i] = V(ntuple(k -> s[i+(k-1)*?], Val(D)))
           return v

julia> a = rand(10^6);

julia> @btime reconstruct2(a, SVector{3, Float64}, 3);
  9.096 ms (2 allocations: 22.89 MiB)

seems pretty fast. Should be no need for a generated function here.


Thanks all for the kind replies!

@kristoffer.carlsson the first suggestion by Sascha benchmarks at 10.983 ms and yours at 15.318 ms in my machine (just for reference purposes).

Since you were all so kind to reply, maybe you will also reply to the following 3 questions as well:

  1. @kristoffer.carlsson why is this very mistical let i = i necessary?
  2. @saschatimme Why does this work without having to make τ a type parameter as well?
  3. Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.

Metaprogramming scares me but I really have to learn it to take full advantage of Julia!


Because I don’t use the information during the code generation. You could also make τ a type parameter and then precompute the products , etc. But keep in mind that there will be a new function compiled for each type combination, so if the τ changes often this may be not a good idea.

Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.

Can you not just define the constructor and then pass the type information to the generated function?


Yes, thank you. Doing something as simple as:

Reconstruction(s::AbstractVector{T}, D, τ) where {T} =
Reconstruction{T, D, τ}(reconstruct2(s, Val{D}(), τ))

worked fine!


A very simple solution that does not require you to play around too much with the type system and allows you to reason much better about data layout/performance:

function reconstruct_pointy(s::AbstractVector, D::Int,  tau::Int)
    L = length(s) - (D-1)*tau;
    T = eltype(s)
    data = Vector{SVector{D, T}}(L)
    asmat = unsafe_wrap(Array, reinterpret(Ptr{T},pointer(data)), (D,L), false )
    for i in 1:L
        @simd for k in 0:(D-1)
            @inbounds  asmat[k+1,i] = s[i+k*tau]
    return data

Edit: This works because Vector{SVector} and Matrix are just syntactic sugar around the same bits.


If we have to write code like this, that would imo be a big failure.


As far as I know, there are various proposals underway to make mutation of immutables that are accessed by-reference easier and give it well-defined multi-threaded semantics.

This is not here yet, but ultimately the correct code should look like:

function reconstruct_future(s::AbstractVector, D::Int,  tau::Int)
    L = length(s) - (D-1)*tau;
    T = eltype(s)
    data = Vector{SVector{D, T}}(L)
    for i in 1:L
        @simd for k in 1:D
            @inbounds  data[i][k] = s[i+(k-1)*tau]
    return data

But yes, this is currently a limitation (especially: Suppose you want to update data[i][k] without writing to data[i][k+1] which is possibly used by a different thread; currently this is undefined, and the compiler mostly just writes the fields you want to write, except when it doesn’t).


data[i] is immutable so I’m not sure what you mean with “updating” it. If you have multiple the location of data[i] that sounds like a race condition.


Matrix and Vector{SVector} are synonymous. You can independently update data[i][k]===asmat[k,i] for different k, same i, in different threads without any races; syntax that makes this hard without pointer-games is a shortcoming of julia, not the machine.

Today’s syntax requires us to “overwrite” the entirety of data[i] with the copied values. In 99% of the cases, the compiler removes the spurious reads and writes, such that this update only affects the single memory address of data[i][k]. If both threads try to update data[i][k1] and data[i][k2] at the same time, and the compiler was clever, there is no race condition. If the compiler was unclever, we get a race condition (e.g. thread one reads data[i][k1] and then stores it back, overwriting the update by thread two).

Yuyichao explained the current state in there and linked to the PR I really hope for Keno’s fix to be accepted at some point, in some variant.

Reinterpreting a matrix is a one-liner; dealing with fieldoffsets in Vector{some_struct} is much, much worse syntax.


I’m not so sure if the following completely satisfies your goal, but it performs 115X faster than your original implementation.

using BenchmarkTools
using StaticArrays

function reconstruct_insane(s::AbstractVector, D::Int,  τ::Int)
    L = length(s) - (D-1)*τ;
    T = eltype(s)
    data = Array{T}(D,L)
    for i = 0:D-1
      data[i+1,:] = s[1+i*τ:L+i*τ]' 
    out = reinterpret(SVector{D,T}, data, (L,1))

const a = rand(10^6)
@btime reconstruct(a, 3, 4);
@btime reconstruct_insane(a, 3, 4);

  1.913 s (27997761 allocations: 1.26 GiB)
  16.554 ms (16 allocations: 45.78 MiB)


reinterpret gives a ReinterpretedArray on 0.7.


This gets 2x faster in my PC

function reconstruct(a, D, τ)
    T = eltype(a)
    n = length(a)-τ*(D-1)    
    out = Vector{T}(n*D)
    for i in 1:n, j in 1:D
        out[D*(i-1)+j] = a[i+(j-1)*τ]
    return reinterpret(SVector{D, T}, out, (n,))


I second your question here. @kristoffer.carlsson, what’s the purpose of the let i = i?


Workaround for


Regarding the reinterpretation variants:

It is better to allocate as Vector{SVector}, reinterpret for modification and return the original. That way all the “dirty-ness” is contained in your function (maybe even put a @noinline in front if your reinterpretation uses extra-dirty pointer games-- for example, my version could probably segfault if inlined and the result is never used).