`@views`

has no effect on code that uses simple scalar indices (no slicing). I donāt think `@fastmath`

helps here, either, unless it allows the `1/n`

factor to be hoisted from your second loop? `@simd`

might help for this kind of loop, as might storing `output[n]`

in a temporary variable like `s = zero(eltype(input))`

since I donāt think the compiler can put `output[n]`

in a register (especially with `@simd`

) even though you are using it over and over.

The type declarations of your arguments donāt help performance, and are overly stringent for correctness. I would do something like `output::AbstractVector, input::AbstractVector, n::Integer`

so that it supports any type with the requisite operations (thatās also why I suggested `zero(eltype(input))`

rather than `0.0`

above). Function argument types are a filter saying for what types the method works, not a performance hint ā when you call the function, the compiler specializes the compiled code for whatever argument types you actually pass.

If you are using `@inbounds`

, then for safety you should do a bounds check at the beginning of the function, before the loops, e.g.:

```
@boundscheck checkbounds(input, 1:n)
@boundscheck checkbounds(output, n:length(input))
```

You should beware that the floating-point roundoff errors for this algorithm will accumulate as the length of your input grows, however. In particular, if you look closely it turns out that what you are doing is *exactly* equivalent to a special case of the sliding DFT for the k=0 āDCā Fourier component (which is your windowed sum of the inputs, not including your `1/n`

scale factor). As your window slides along your data, the roundoff errors grow, and it was analyzed rigorously in

In particular, if Iām reading this paper correctly, if `L = length(input)`

, then your root-mean-square relative error is expected to grow as O(ā`L`

) (theorem 4.2 in the paper), which could be problematic if `L`

is large. *Caveat emptor.*