[ANN] Smoothers: Collection of basic smoothers and smoother related applications

In order to learn Julia I began building the package Forecast, after familiarizing myself with the language it was just natural to move from adding new models to the package to refactor it so that the package could benefit from some of the great features that Julia has to offer.

In particular, I am going to divide Forecast into several smaller packages since expanding functionality in Julia packages comes natural to the language. The first of these packages is Smoothers.

The package Smoothers provides a collection of smoothing heuristics, models and smoothing related applications. The current available smoothers and applications are:

  • Henderson Moving Average Filter ( hma )
  • Linear Time-invariant Difference Equation Filter ( filter ) - Matlab/Octave
  • Locally Estimated Scatterplot Smoothing ( loess )
  • Seasonal and Trend decomposition based on Loess ( stl )
  • Simple Moving Average ( sma )

image

Note: Forecast: stl, loess return exact solutions, the implementation for Smoothers: stl, loess are fast (much faster) implementations but still offer the possibility for exact solutions.

18 Likes

Cool! I had a quick look at the source code and saw in filter:

    for n in 1:Nx
        for k in 0:M
            @inbounds y[n] += n-k > 0 ? d[k+1]*x[n-k] : T(0.0)
        end
        for k in 1:N
            @inbounds y[n] -= n-k > 0 ? c[k+1]*y[n-k] : T(0.0)
        end
    end

(https://github.com/viraltux/Smoothers.jl/blob/56f1c442e204b5111be5ee4b95fad06bb9c67b21/src/filter.jl#L65-L72)

I think there’s a few things that can maybe be improved here (just quick thoughts, I may be wrong)

  1. inbounds can be applied on the outer loop
  2. T(0.0) did you mean zero(T) ?
  3. the branch (?) is unnecessary if you do the first loop over n+1:M (unless I’m missing something?) similar question for the second loop
  4. I think d and x potentially have different types so there would be a type promotion in the d[k+1]*x[n-k] operation a lot of times which is not ideal

The package seems useful, kudos :slight_smile:

4 Likes

Thank you @tlienart for having a look at the code, I myself I am learning about how to optimize Julia code and I appreciate your help.

Actually using @inbounds didn’t make much of a difference here, I guess Julia was smart enough to already optimize those loops, one question though, do you know if using @inbounds in the outer loop would apply to any sub-loop and any array in those sub-loops?

I am promoting the types’ parameters before I used them, I meant T(0.0) where T is the type of x after a,b,x,si,_ = Base.promote(a,b,x,si,[Base.promote_op(/,B,A)(1.0)])

Yeah, actually I used this formulation because it was easy to test and it was already really fast, I am aware though that if I treat the n-k >0 values outside the loop I do not need (?).

As a matter of fact, I am pushing some optimizations with higher priority on the original code as we speak (for hma in this case) but thanks for the reminder!

3 Likes

Following up on the kind advice on optimization from @tlienart , I wonder if anybody knows of a good tutorial for parallelization on Julia. When I tried @spawn in loess (which admits parallelization for each value) it would become slower instead faster.

Maybe @tkf’s material here

1 Like

If you’re just spawning one task per data point, this will produce a LOT of scheduling overhead. For this particularly problem it seems likely that @threads is the simplest abstraction, the fitting should be similar in duration for each data point, and this macro auto-splits the data into similar sized blocks with N tasks where N = number of OS threads.