Performant creation of vector of SVectors given a known formula

Datseris · December 9, 2017, 9:33am

I want to create a vector of SVectors. I have to create all SVectors on the spot, given a formula that I know, and given also the dimensionality I know. The dimensionality can also be type-parameter, there is no problem with that (I already have a type that has it as a type parameter).

The problem is, the formula is accessing the elements of an array one by one, so I cannot reduce it to basic julia functions where SVectors know how to do their thing.

In short, this is what I am doing now:

function reconstruct(s::AbstractVector, D::Int,  τ::Int)
    L = length(s) - (D-1)*τ;
    T = eltype(s)
    data = Vector{SVector{D, T}}(L)
    for i in 1:L
        gen = (s[i+k*τ] for k in 0:(D-1))
        data[i] = SVector{D,T}(gen...)
    end
    V = typeof(s)
    T = eltype(s)
    return data
end

(once again: both τ and D can be made type-parameters if need be)

I am quite sure that using a generator is not the most performant thing to do. I got a hunch that the most performant version would involve meta-programming and @generated? But unfortunately I do not have any metaprogramming skills whatsoever currently so I am asking for assistance!

saschatimme · December 9, 2017, 10:49am

If you get used to metaprogramming its actually not that hard

This would be the generated function (where you have to make D a type parameter)

function reconstruct_impl(::Type{Val{D}}) where D
    gens = [:(s[i + $k*τ]) for k=0:D-1]

    quote
        L = length(s) - ($(D-1))*τ;
        T = eltype(s)
        data = Vector{SVector{$D, T}}(L)
        for i in 1:L
            data[i] = SVector{$D,T}($(gens...))
        end
        V = typeof(s)
        T = eltype(s)
        data
    end
end

@generated function reconstruct(s::AbstractVector, ::Val{D},  τ::Int) where D
    reconstruct_impl(Val{D})
end

You can check the generated code with reconstruct_impl(::Val{4}) and to call reconstruct you would have to do reconstruct(s, Val{4}(), τ).

andyferris · December 9, 2017, 11:03am

I’d just add that using the Size trait for propagating sizes of static arrays is a good way to go.

We’d also like it easier to map over a “static” unit range, use generators & comprehensions, and so-on. ntuple(f, Val{D}) is often useful, also.

kristoffer.carlsson · December 9, 2017, 11:20am

julia> function reconstruct2(s::AbstractVector{T}, V::Type{SVector{D, T}},  ?::Int) where {T, D}
           L = length(s) - (D-1)*? 
           v = zeros(V, L)
           for i in 1:L
               let i = i
                   v[i] = V(ntuple(k -> s[i+(k-1)*?], Val(D)))
               end
           end
           return v
       end

julia> a = rand(10^6);

julia> @btime reconstruct2(a, SVector{3, Float64}, 3);
  9.096 ms (2 allocations: 22.89 MiB)

seems pretty fast. Should be no need for a generated function here.

Datseris · December 9, 2017, 11:27am

Thanks all for the kind replies!

@kristoffer.carlsson the first suggestion by Sascha benchmarks at 10.983 ms and yours at 15.318 ms in my machine (just for reference purposes).

Since you were all so kind to reply, maybe you will also reply to the following 3 questions as well:

@kristoffer.carlsson why is this very mistical let i = i necessary?
@saschatimme Why does this work without having to make τ a type parameter as well?
Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.

Metaprogramming scares me but I really have to learn it to take full advantage of Julia!

saschatimme · December 9, 2017, 11:34am

Because I don’t use the information during the code generation. You could also make τ a type parameter and then precompute the products 1τ, 2τ etc. But keep in mind that there will be a new function compiled for each type combination, so if the τ changes often this may be not a good idea.

Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.

Can you not just define the constructor and then pass the type information to the generated function?

Datseris · December 9, 2017, 11:55am

Yes, thank you. Doing something as simple as:


Reconstruction(s::AbstractVector{T}, D, τ) where {T} =
Reconstruction{T, D, τ}(reconstruct2(s, Val{D}(), τ))

worked fine!

foobar_lv2 · December 9, 2017, 7:32pm

A very simple solution that does not require you to play around too much with the type system and allows you to reason much better about data layout/performance:

function reconstruct_pointy(s::AbstractVector, D::Int,  tau::Int)
    L = length(s) - (D-1)*tau;
    T = eltype(s)
    assert(isbits(T))
    data = Vector{SVector{D, T}}(L)
    asmat = unsafe_wrap(Array, reinterpret(Ptr{T},pointer(data)), (D,L), false )
    for i in 1:L
        @simd for k in 0:(D-1)
            @inbounds  asmat[k+1,i] = s[i+k*tau]
        end
    end
    return data
end

Edit: This works because Vector{SVector} and Matrix are just syntactic sugar around the same bits.

kristoffer.carlsson · December 9, 2017, 7:36pm

If we have to write code like this, that would imo be a big failure.

foobar_lv2 · December 9, 2017, 7:53pm

As far as I know, there are various proposals underway to make mutation of immutables that are accessed by-reference easier and give it well-defined multi-threaded semantics.

This is not here yet, but ultimately the correct code should look like:

function reconstruct_future(s::AbstractVector, D::Int,  tau::Int)
    L = length(s) - (D-1)*tau;
    T = eltype(s)
    assert(isbits(T))
    data = Vector{SVector{D, T}}(L)
    for i in 1:L
        @simd for k in 1:D
            @inbounds  data[i][k] = s[i+(k-1)*tau]
        end
    end
    return data
end

But yes, this is currently a limitation (especially: Suppose you want to update data[i][k] without writing to data[i][k+1] which is possibly used by a different thread; currently this is undefined, and the compiler mostly just writes the fields you want to write, except when it doesn’t).

kristoffer.carlsson · December 9, 2017, 8:03pm

data[i] is immutable so I’m not sure what you mean with “updating” it. If you have multiple the location of data[i] that sounds like a race condition.

foobar_lv2 · December 9, 2017, 8:46pm

Matrix and Vector{SVector} are synonymous. You can independently update data[i][k]===asmat[k,i] for different k, same i, in different threads without any races; syntax that makes this hard without pointer-games is a shortcoming of julia, not the machine.

Today’s syntax requires us to “overwrite” the entirety of data[i] with the copied values. In 99% of the cases, the compiler removes the spurious reads and writes, such that this update only affects the single memory address of data[i][k]. If both threads try to update data[i][k1] and data[i][k2] at the same time, and the compiler was clever, there is no race condition. If the compiler was unclever, we get a race condition (e.g. thread one reads data[i][k1] and then stores it back, overwriting the update by thread two).

Yuyichao explained the current state in there https://discourse.julialang.org/t/on-modifying-immutables-by-reference-in-multithreaded-code/6580 and linked to the PR https://github.com/JuliaLang/julia/pull/21912. I really hope for Keno’s fix to be accepted at some point, in some variant.

Reinterpreting a matrix is a one-liner; dealing with fieldoffsets in Vector{some_struct} is much, much worse syntax.

Seif_Shebl · December 9, 2017, 10:43pm

I’m not so sure if the following completely satisfies your goal, but it performs 115X faster than your original implementation.

using BenchmarkTools
using StaticArrays

function reconstruct_insane(s::AbstractVector, D::Int,  τ::Int)
    L = length(s) - (D-1)*τ;
    T = eltype(s)
    data = Array{T}(D,L)
    for i = 0:D-1
      data[i+1,:] = s[1+i*τ:L+i*τ]' 
    end
    out = reinterpret(SVector{D,T}, data, (L,1))
end

const a = rand(10^6)
@btime reconstruct(a, 3, 4);
@btime reconstruct_insane(a, 3, 4);

  1.913 s (27997761 allocations: 1.26 GiB)
  16.554 ms (16 allocations: 45.78 MiB)

kristoffer.carlsson · December 9, 2017, 10:50pm

reinterpret gives a ReinterpretedArray on 0.7.

rojo · December 10, 2017, 10:50am

This gets 2x faster in my PC

function reconstruct(a, D, τ)
    T = eltype(a)
    n = length(a)-τ*(D-1)    
    out = Vector{T}(n*D)
    for i in 1:n, j in 1:D
        out[D*(i-1)+j] = a[i+(j-1)*τ]
    end
    
    return reinterpret(SVector{D, T}, out, (n,))
end

rojo · December 10, 2017, 10:52am

I second your question here. @kristoffer.carlsson, what’s the purpose of the let i = i?

kristoffer.carlsson · December 10, 2017, 11:07am

Workaround for performance of captured variables in closures · Issue #15276 · JuliaLang/julia · GitHub

foobar_lv2 · December 11, 2017, 1:22pm

Regarding the reinterpretation variants:

It is better to allocate as Vector{SVector}, reinterpret for modification and return the original. That way all the “dirty-ness” is contained in your function (maybe even put a @noinline in front if your reinterpretation uses extra-dirty pointer games-- for example, my version could probably segfault if inlined and the result is never used).

Topic		Replies	Views
Generate function to build a static array from vector of vector Performance metaprogramming	5	679	April 26, 2021
Performance of custom `Vec` type versus `SVector{3, Float64}` Performance staticarrays	4	435	February 1, 2022
How to create an SVector of SVectors part II General Usage question , staticarrays	7	435	May 14, 2021
Create a vector of SVectors New to Julia staticarrays	5	480	May 18, 2021
Static vector of functions as a field of a composite type New to Julia performance , function , staticarrays , symbolics , autodiff	4	571	May 20, 2023

Performant creation of vector of SVectors given a known formula

Related topics