@views has no effect on code that uses simple scalar indices (no slicing). I don’t think
@fastmath helps here, either, unless it allows the
1/n factor to be hoisted from your second loop?
@simd might help for this kind of loop, as might storing
output[n] in a temporary variable like
s = zero(eltype(input)) since I don’t think the compiler can put
output[n] in a register (especially with
@simd) even though you are using it over and over.
The type declarations of your arguments don’t help performance, and are overly stringent for correctness. I would do something like
output::AbstractVector, input::AbstractVector, n::Integer so that it supports any type with the requisite operations (that’s also why I suggested
zero(eltype(input)) rather than
0.0 above). Function argument types are a filter saying for what types the method works, not a performance hint — when you call the function, the compiler specializes the compiled code for whatever argument types you actually pass.
If you are using
@inbounds, then for safety you should do a bounds check at the beginning of the function, before the loops, e.g.:
@boundscheck checkbounds(input, 1:n)
@boundscheck checkbounds(output, n:length(input))
You should beware that the floating-point roundoff errors for this algorithm will accumulate as the length of your input grows, however. In particular, if you look closely it turns out that what you are doing is exactly equivalent to a special case of the sliding DFT for the k=0 “DC” Fourier component (which is your windowed sum of the inputs, not including your
1/n scale factor). As your window slides along your data, the roundoff errors grow, and it was analyzed rigorously in
In particular, if I’m reading this paper correctly, if
L = length(input), then your root-mean-square relative error is expected to grow as O(√
L) (theorem 4.2 in the paper), which could be problematic if
L is large. Caveat emptor.