I want to create a vector of SVectors. I have to create all SVectors on the spot, given a formula that I know, and given also the dimensionality I know. The dimensionality can also be type-parameter, there is no problem with that (I already have a type that has it as a type parameter).
The problem is, the formula is accessing the elements of an array one by one, so I cannot reduce it to basic julia functions where SVectors know how to do their thing.
In short, this is what I am doing now:
function reconstruct(s::AbstractVector, D::Int, Ď„::Int)
L = length(s) - (D-1)*Ď„;
T = eltype(s)
data = Vector{SVector{D, T}}(L)
for i in 1:L
gen = (s[i+k*Ď„] for k in 0:(D-1))
data[i] = SVector{D,T}(gen...)
end
V = typeof(s)
T = eltype(s)
return data
end
(once again: both Ď„ and D can be made type-parameters if need be)
I am quite sure that using a generator is not the most performant thing to do. I got a hunch that the most performant version would involve meta-programming and @generated? But unfortunately I do not have any metaprogramming skills whatsoever currently so I am asking for assistance!
If you get used to metaprogramming its actually not that hard
This would be the generated function (where you have to make D a type parameter)
function reconstruct_impl(::Type{Val{D}}) where D
gens = [:(s[i + $k*Ď„]) for k=0:D-1]
quote
L = length(s) - ($(D-1))*Ď„;
T = eltype(s)
data = Vector{SVector{$D, T}}(L)
for i in 1:L
data[i] = SVector{$D,T}($(gens...))
end
V = typeof(s)
T = eltype(s)
data
end
end
@generated function reconstruct(s::AbstractVector, ::Val{D}, Ď„::Int) where D
reconstruct_impl(Val{D})
end
You can check the generated code with reconstruct_impl(::Val{4}) and to call reconstruct you would have to do reconstruct(s, Val{4}(), Ď„).
julia> function reconstruct2(s::AbstractVector{T}, V::Type{SVector{D, T}}, ?::Int) where {T, D}
L = length(s) - (D-1)*?
v = zeros(V, L)
for i in 1:L
let i = i
v[i] = V(ntuple(k -> s[i+(k-1)*?], Val(D)))
end
end
return v
end
julia> a = rand(10^6);
julia> @btime reconstruct2(a, SVector{3, Float64}, 3);
9.096 ms (2 allocations: 22.89 MiB)
seems pretty fast. Should be no need for a generated function here.
@saschatimme Why does this work without having to make Ď„ a type parameter as well?
Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.
Metaprogramming scares me but I really have to learn it to take full advantage of Julia!
Because I don’t use the information during the code generation. You could also make τ a type parameter and then precompute the products 1τ, 2τ etc. But keep in mind that there will be a new function compiled for each type combination, so if the τ changes often this may be not a good idea.
Is it possible to make this function a constructor for a type Reconstruction{T, D, τ} instead? I’ve been trying to make it happen but it seems the only way is to pass a type ::Reconstruction{T, D, τ} as an argument to the generated function itself.
Can you not just define the constructor and then pass the type information to the generated function?
A very simple solution that does not require you to play around too much with the type system and allows you to reason much better about data layout/performance:
function reconstruct_pointy(s::AbstractVector, D::Int, tau::Int)
L = length(s) - (D-1)*tau;
T = eltype(s)
assert(isbits(T))
data = Vector{SVector{D, T}}(L)
asmat = unsafe_wrap(Array, reinterpret(Ptr{T},pointer(data)), (D,L), false )
for i in 1:L
@simd for k in 0:(D-1)
@inbounds asmat[k+1,i] = s[i+k*tau]
end
end
return data
end
Edit: This works because Vector{SVector} and Matrix are just syntactic sugar around the same bits.
As far as I know, there are various proposals underway to make mutation of immutables that are accessed by-reference easier and give it well-defined multi-threaded semantics.
This is not here yet, but ultimately the correct code should look like:
function reconstruct_future(s::AbstractVector, D::Int, tau::Int)
L = length(s) - (D-1)*tau;
T = eltype(s)
assert(isbits(T))
data = Vector{SVector{D, T}}(L)
for i in 1:L
@simd for k in 1:D
@inbounds data[i][k] = s[i+(k-1)*tau]
end
end
return data
end
But yes, this is currently a limitation (especially: Suppose you want to update data[i][k] without writing to data[i][k+1] which is possibly used by a different thread; currently this is undefined, and the compiler mostly just writes the fields you want to write, except when it doesn’t).
data[i] is immutable so I’m not sure what you mean with “updating” it. If you have multiple the location of data[i] that sounds like a race condition.
Matrix and Vector{SVector} are synonymous. You can independently update data[i][k]===asmat[k,i] for different k, same i, in different threads without any races; syntax that makes this hard without pointer-games is a shortcoming of julia, not the machine.
Today’s syntax requires us to “overwrite” the entirety of data[i] with the copied values. In 99% of the cases, the compiler removes the spurious reads and writes, such that this update only affects the single memory address of data[i][k]. If both threads try to update data[i][k1] and data[i][k2] at the same time, and the compiler was clever, there is no race condition. If the compiler was unclever, we get a race condition (e.g. thread one reads data[i][k1] and then stores it back, overwriting the update by thread two).
I’m not so sure if the following completely satisfies your goal, but it performs 115X faster than your original implementation.
using BenchmarkTools
using StaticArrays
function reconstruct_insane(s::AbstractVector, D::Int, Ď„::Int)
L = length(s) - (D-1)*Ď„;
T = eltype(s)
data = Array{T}(D,L)
for i = 0:D-1
data[i+1,:] = s[1+i*Ď„:L+i*Ď„]'
end
out = reinterpret(SVector{D,T}, data, (L,1))
end
const a = rand(10^6)
@btime reconstruct(a, 3, 4);
@btime reconstruct_insane(a, 3, 4);
1.913 s (27997761 allocations: 1.26 GiB)
16.554 ms (16 allocations: 45.78 MiB)
function reconstruct(a, D, Ď„)
T = eltype(a)
n = length(a)-Ď„*(D-1)
out = Vector{T}(n*D)
for i in 1:n, j in 1:D
out[D*(i-1)+j] = a[i+(j-1)*Ď„]
end
return reinterpret(SVector{D, T}, out, (n,))
end
It is better to allocate as Vector{SVector}, reinterpret for modification and return the original. That way all the “dirty-ness” is contained in your function (maybe even put a @noinline in front if your reinterpretation uses extra-dirty pointer games-- for example, my version could probably segfault if inlined and the result is never used).