Your code looks pretty good, although you should check out BenchmarkTools.jl for more accurate timing than just @time. I also think that @fastmath is probably not doing anything in your case, since the only calculations itās affecting are the integer indices i-n+1.
However, there is an important algorithmic trick that you are currently missing. To compute a rolling average, itās not necessary to separately compute the mean of each range. Instead, you can maintain a running total and just add and subtract one element at each iteration. In other words, you have a value x, starting at 0. At each iteration, you add input[i] to x and subtract input[i - n], then you have output[i] = x / n.
That should save you about a factor of n in your computation time.
@views has no effect on code that uses simple scalar indices (no slicing). I donāt think @fastmath helps here, either, unless it allows the 1/n factor to be hoisted from your second loop? @simd might help for this kind of loop, as might storing output[n] in a temporary variable like s = zero(eltype(input)) since I donāt think the compiler can put output[n] in a register (especially with @simd) even though you are using it over and over.
The type declarations of your arguments donāt help performance, and are overly stringent for correctness. I would do something like output::AbstractVector, input::AbstractVector, n::Integer so that it supports any type with the requisite operations (thatās also why I suggested zero(eltype(input)) rather than 0.0 above). Function argument types are a filter saying for what types the method works, not a performance hint ā when you call the function, the compiler specializes the compiled code for whatever argument types you actually pass.
If you are using @inbounds, then for safety you should do a bounds check at the beginning of the function, before the loops, e.g.:
You should beware that the floating-point roundoff errors for this algorithm will accumulate as the length of your input grows, however. In particular, if you look closely it turns out that what you are doing is exactly equivalent to a special case of the sliding DFT for the k=0 āDCā Fourier component (which is your windowed sum of the inputs, not including your 1/n scale factor). As your window slides along your data, the roundoff errors grow, and it was analyzed rigorously in
In particular, if Iām reading this paper correctly, if L = length(input), then your root-mean-square relative error is expected to grow as O(āL) (theorem 4.2 in the paper), which could be problematic if L is large. Caveat emptor.