To determine the values of a simple moving average, where the length of the MA window is much smaller than the length of the data vector, one may … and usually does … run through the data sequentially, moving the window in unit increments (as it were).

With a window of 10, and a data source of 49 sequential values (where the first nine are used only to contribute to the first MA value), it is possible to run more than one MA over apportionings of the data and to recombine these tasks’ results to obtain the same result as above.

What way does this happen well using Julia?

Frankly, with a vector of length 49, I doubt you would gain anything from parallelization; I expect the overhead would dominate the computational cost.

the 49 was just for easy math – the actual vectors have 100s of 1000s of values

You can experiment with `pmap`

(see the manual), but note that MA is usually cheap for the CPU and thus the bottleneck may be memory access.

I was hoping for a SIMDy way. The distributed processing would be too heavy for this (and not all vectors are that long, some are 750).

If you have a type which does not suffer from floating point error (eg `Int`

), you can do a “rolling sum”, adding and subtracting as you work through the vector. This may be amenable to SIMD. But for floats, this may not be accurate (again, depending on dimension and distribution of the actual data).

Also, you may get better help if you provide an MWE that roughly matches your dimensions (both for vector and window length). So far you talked about 10/49, ?/750, and ?/10^5, so I m confused about this. If you have really long windows, you could sum subsequences and then use those with a clever algorithm, but it would only be worth it for long windows.