MWE:
using StaticArrays, BenchmarkTools
struct DelayEmbedding{D}
delays::SVector{D, Int}
end
@inline DelayEmbedding(D::Int) = DelayEmbedding(Val{D}())
@inline function DelayEmbedding(::Val{D}) where {D}
idxs = [k for k in 1:D]
return DelayEmbedding{D}(SVector{D, Int}(idxs...))
end
@generated function (r::DelayEmbedding{D})(s::AbstractArray{T}, i) where {D, T}
gens = [:(s[i + r.delays[$k]]) for k=1:D]
quote
@inbounds return SVector{$D+1,T}(s[i], $(gens...))
end
end
This works and does not allocate:
x = rand(10000);
e = DelayEmbedding(1)
@btime $e($x, 4);
# 2.052 ns (0 allocations: 0 bytes)
Buuut, when I create a minimal wrapper function that instantiates DelayEmbedding
, I get allocations:
@inline function reconstruct(s::AbstractVector{T}, D::Int) where {T}
de::DelayEmbedding{D} = DelayEmbedding(Val{D}())
c = 0.0
for i in 1:100
data = de(s, i)
c += data[1]
end
return c
end
I now benchmark this reconstruct
function:
D = 1
@btime reconstruct($x, $D);
# 8.348 ÎĽs (304 allocations: 6.39 KiB)
@btime reconstruct($x, 1);
# 736.273 ns (2 allocations: 112 bytes)
It is “weird” that if D
is a literal the performance difference is massive…
To my understanding, it is possible to write such code as above in Julia 1.0 because of constant propagation. On the other hand it is obvious that I have not understood how it works, since my code fails.
Is there any way to make my code be as fast as in the second case, even when giving in a variable instead of literal?