I want to create a vector of SVectors. I have to create all SVectors on the spot, given a formula that I know, and given also the dimensionality I know. The dimensionality can also be type-parameter, there is no problem with that (I already have a type that has it as a type parameter).

The problem is, the formula is accessing the elements of an array one by one, so I cannot reduce it to basic julia functions where SVectors know how to do their thing.

In short, this is what I am doing now:

function reconstruct(s::AbstractVector, D::Int, Ď„::Int)
L = length(s) - (D-1)*Ď„;
T = eltype(s)
data = Vector{SVector{D, T}}(L)
for i in 1:L
gen = (s[i+k*Ď„] for k in 0:(D-1))
data[i] = SVector{D,T}(gen...)
end
V = typeof(s)
T = eltype(s)
return data
end

(once again: both Ď„ and D can be made type-parameters if need be)

I am quite sure that using a generator is not the most performant thing to do. I got a hunch that the most performant version would involve meta-programming and @generated? But unfortunately I do not have any metaprogramming skills whatsoever currently so I am asking for assistance!

If you get used to metaprogramming its actually not that hard

This would be the generated function (where you have to make D a type parameter)

function reconstruct_impl(::Type{Val{D}}) where D
gens = [:(s[i + $k*Ď„]) for k=0:D-1]
quote
L = length(s) - ($(D-1))*Ď„;
T = eltype(s)
data = Vector{SVector{$D, T}}(L)
for i in 1:L
data[i] = SVector{$D,T}($(gens...))
end
V = typeof(s)
T = eltype(s)
data
end
end
@generated function reconstruct(s::AbstractVector, ::Val{D}, Ď„::Int) where D
reconstruct_impl(Val{D})
end

You can check the generated code with reconstruct_impl(::Val{4}) and to call reconstruct you would have to do reconstruct(s, Val{4}(), Ď„).

julia> function reconstruct2(s::AbstractVector{T}, V::Type{SVector{D, T}}, ?::Int) where {T, D}
L = length(s) - (D-1)*?
v = zeros(V, L)
for i in 1:L
let i = i
v[i] = V(ntuple(k -> s[i+(k-1)*?], Val(D)))
end
end
return v
end
julia> a = rand(10^6);
julia> @btime reconstruct2(a, SVector{3, Float64}, 3);
9.096 ms (2 allocations: 22.89 MiB)

seems pretty fast. Should be no need for a generated function here.

@saschatimme Why does this work without having to make Ď„ a type parameter as well?

Is it possible to make this function a constructor for a type Reconstruction{T, D, Ď„} instead? Iâ€™ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, Ď„} as an argument to the generated function itself.

Metaprogramming scares me but I really have to learn it to take full advantage of Julia!

Because I donâ€™t use the information during the code generation. You could also make Ď„ a type parameter and then precompute the products 1Ď„, 2Ď„ etc. But keep in mind that there will be a new function compiled for each type combination, so if the Ď„ changes often this may be not a good idea.

Is it possible to make this function a constructor for a type Reconstruction{T, D, Ď„} instead? Iâ€™ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, Ď„} as an argument to the generated function itself.

Can you not just define the constructor and then pass the type information to the generated function?

A very simple solution that does not require you to play around too much with the type system and allows you to reason much better about data layout/performance:

function reconstruct_pointy(s::AbstractVector, D::Int, tau::Int)
L = length(s) - (D-1)*tau;
T = eltype(s)
assert(isbits(T))
data = Vector{SVector{D, T}}(L)
asmat = unsafe_wrap(Array, reinterpret(Ptr{T},pointer(data)), (D,L), false )
for i in 1:L
@simd for k in 0:(D-1)
@inbounds asmat[k+1,i] = s[i+k*tau]
end
end
return data
end

Edit: This works because Vector{SVector} and Matrix are just syntactic sugar around the same bits.

As far as I know, there are various proposals underway to make mutation of immutables that are accessed by-reference easier and give it well-defined multi-threaded semantics.

This is not here yet, but ultimately the correct code should look like:

function reconstruct_future(s::AbstractVector, D::Int, tau::Int)
L = length(s) - (D-1)*tau;
T = eltype(s)
assert(isbits(T))
data = Vector{SVector{D, T}}(L)
for i in 1:L
@simd for k in 1:D
@inbounds data[i][k] = s[i+(k-1)*tau]
end
end
return data
end

But yes, this is currently a limitation (especially: Suppose you want to update data[i][k] without writing to data[i][k+1] which is possibly used by a different thread; currently this is undefined, and the compiler mostly just writes the fields you want to write, except when it doesnâ€™t).

data[i] is immutable so Iâ€™m not sure what you mean with â€śupdatingâ€ť it. If you have multiple the location of data[i] that sounds like a race condition.

Matrix and Vector{SVector} are synonymous. You can independently update data[i][k]===asmat[k,i] for different k, same i, in different threads without any races; syntax that makes this hard without pointer-games is a shortcoming of julia, not the machine.

Todayâ€™s syntax requires us to â€śoverwriteâ€ť the entirety of data[i] with the copied values. In 99% of the cases, the compiler removes the spurious reads and writes, such that this update only affects the single memory address of data[i][k]. If both threads try to update data[i][k1] and data[i][k2] at the same time, and the compiler was clever, there is no race condition. If the compiler was unclever, we get a race condition (e.g. thread one reads data[i][k1] and then stores it back, overwriting the update by thread two).

Iâ€™m not so sure if the following completely satisfies your goal, but it performs 115X faster than your original implementation.

using BenchmarkTools
using StaticArrays
function reconstruct_insane(s::AbstractVector, D::Int, Ď„::Int)
L = length(s) - (D-1)*Ď„;
T = eltype(s)
data = Array{T}(D,L)
for i = 0:D-1
data[i+1,:] = s[1+i*Ď„:L+i*Ď„]'
end
out = reinterpret(SVector{D,T}, data, (L,1))
end
const a = rand(10^6)
@btime reconstruct(a, 3, 4);
@btime reconstruct_insane(a, 3, 4);
1.913 s (27997761 allocations: 1.26 GiB)
16.554 ms (16 allocations: 45.78 MiB)

function reconstruct(a, D, Ď„)
T = eltype(a)
n = length(a)-Ď„*(D-1)
out = Vector{T}(n*D)
for i in 1:n, j in 1:D
out[D*(i-1)+j] = a[i+(j-1)*Ď„]
end
return reinterpret(SVector{D, T}, out, (n,))
end

It is better to allocate as Vector{SVector}, reinterpret for modification and return the original. That way all the â€śdirty-nessâ€ť is contained in your function (maybe even put a @noinline in front if your reinterpretation uses extra-dirty pointer games-- for example, my version could probably segfault if inlined and the result is never used).