Savitzky Golay filter

Hey, I’m new to Julia, transitioning from the Python community. I attempted to implement a Savitzky-Golay filter in Julia, aiming to precompute certain parts of the code akin to Python’s class with __init__() method. While the code works, upon checking its execution time, I found it wasn’t any faster than the version that performs matrix inversion every time it’s executed. It seems my efforts were futile. Could you assist me in understanding what I might have done wrong? Here’s my code:

struct SavitzkyGolayFilter
    basis::Matrix{<:Real}
    proj_mat::Matrix{<:Real}
    transform::Function

    function SavitzkyGolayFilter(M::Int,N::Int)
        x = -M:M
        basis_matrix = zeros(2M+1, N+1)
        for j = 1:N+1
            basis_matrix[:, j] = x .^ (j-1)
        end
        proj_mat = (basis_matrix'basis_matrix) \ basis_matrix'
        for j = 1:N+1
            proj_mat[j,:].*= factorial(j-1)
        end 

        function transform(signal::Vector{<:Real}; deriv=0)
            filter_coeffs = reverse(proj_mat[deriv+1,:])            
            return conv(filter_coeffs, signal)[M+1:end-M]
        end 
        
        return new(basis_matrix,  proj_mat, transform)
    end

end

Take a look at the Performance tips page in the Manual: Performance Tips · The Julia Language

The Performance tips have a section titled Avoid containers with abstract type parameters.

The Performance tips have a section titled Avoid fields with abstract type.

You could fix this, for example, by parameterizing SavitzkyGolayFilter, something like this:

struct SavitzkyGolayFilter{U,V,Func}
  basis::Matrix{U}
  proj_mat::Matrix{V}
  transform::Func
  ...
end
2 Likes

@nsajko Thank you for your comments. I’m relatively new to Julia, so I’m still navigating its intricacies. I recall reading that one of Julia’s strengths is its high performance even without explicitly specifying input types. While I’m not entirely clear on the concept of structures, I found that Julia offered a similar behavior to what I’m accustomed to in Python, which prompted me to use it. However, I’m uncertain about how to specify input types when constructing it, especially since I’m passing two integers while the elements of the structure consist of two matrices and a function.

Regarding the initial concern about performance, I’ve simplified the problem to a few steps and compared my results with available S-G implementations, as demonstrated below:

using SavitzkyGolay
using DSP: conv
using BenchmarkTools

signal = rand(1024)

M = 9
N = 3
deriv = 1


smoothed = savitzky_golay(signal, M, N; deriv=deriv).y
println("Savitzky-Golay filter form SavitzkyGolay.jl")
@btime smoothed = savitzky_golay(signal, M, N; deriv=deriv).y

sgfilter = SGolay(M, N, deriv)
sgfilter(signal).y
println("Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand")
@btime sgfilter(signal).y

m = M÷2
x = -m:m
basis_matrix = zeros(M, N+1)
for j = 1:N+1
    basis_matrix[:, j] = x .^ (j-1)
end
proj_mat = (basis_matrix'basis_matrix) \ basis_matrix'
for j = 1:N+1
    proj_mat[j,:].*= factorial(j-1)
end 

filter_coeffs = reverse(proj_mat[deriv+1,:])

smoothed = conv(signal, filter_coeffs )[m+1:end-m]
println("Savitzky-Golay filter with a projection matrix builded beforehand, using conv from DSP.jl")
@btime smoothed = conv(signal, filter_coeffs )[m+1:end-m]

function conv_1d(signal::Vector{T}, kernel::Vector{T}) where T
    output = zeros(T, length(signal))
    
    
    m = (length(kernel) + 1) ÷ 2
    for i in eachindex(signal)
        for j in eachindex(kernel)
            if (i - j + m ≥ 1) 
                if (i - j + m ≤ length(signal))
                    output[i] += signal[i - j + m] * kernel[j]
                end
            end
        end
    end
    return output
end

smoothed = conv_1d(signal, filter_coeffs )
println("Savitzky-Golay filter with a projection matrix builded beforehand, using conv from a naive implementation")
@btime smoothed = conv_1d(signal, filter_coeffs );


struct SavitzkyGolayFilter{M,N,deriv} end
@generated function (::SavitzkyGolayFilter{M,N,deriv})(data::AbstractVector{T}) where {M, N, deriv, T}
            #Create Jacobian matrix
            J = zeros(2M+1, N+1)
            for i=1:2M+1, j=1:N+1
                J[i, j] = (i-M-1)^(j-1)/factorial(j-1)
            end
            e₁ = zeros(N+1)
            e₁[deriv+1] = 1.0

            #Compute filter coefficients
            C = J' \ e₁

            #Evaluate filter on data matrix

            To = typeof(C[1] * one(T)) #Calculate type of output
            expr = quote
                n = size(data, 1)
                smoothed = zeros($To, n)
                @inbounds for i in eachindex(smoothed)
                    smoothed[i] += $(C[M+1])*data[i]
                end
                smoothed
            end

            for j=1:M
                insert!(expr.args[6].args[3].args[2].args, 1,
                    :(if i - $j ≥ 1
                        smoothed[i] += $(C[M+1-j])*data[i-$j]
                      end)
                )
                push!(expr.args[6].args[3].args[2].args,
                    :(if i + $j ≤ n
                        smoothed[i] += $(C[M+1+j])*data[i+$j]
                      end)
                )
            end

            return expr
end

svg_filter = SavitzkyGolayFilter{m, N, deriv}()
smoothed = svg_filter(signal )
println("Savitzky-Golay filter from https://gist.github.com/jiahao/b8b5ac328c18b7ae8a17")
@btime smoothed = svg_filter(signal );

The performance I achieve with a straightforward convolution implementation is on par with the results from SavitzkyGolay.jl. Interestingly, the conv function from DSP.jl is roughly twice as slow, likely because it employs an FFT algorithm, which may be less efficient for shorter inputs. I was surprised to find that the S-G filter implementation by @jiahao (An implementation of the Savitzky-Golay filter using generated functions. Accompanies https://medium.com/@acidflask/smoothing-data-with-julia-s-generated-functions-c80e240e05f3 · GitHub) is four times faster than my naive implementation, despite both performing the same computations.

Savitzky-Golay filter form SavitzkyGolay.jl
  18.645 μs (66 allocations: 107.91 KiB)
Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand
  18.197 μs (63 allocations: 99.75 KiB)
Savitzky-Golay filter with a projection matrix builded beforehand, using conv from DSP.jl
  31.858 μs (19 allocations: 18.97 KiB)
Savitzky-Golay filter with a projection matrix builded beforehand, using conv from a naive implementation
  15.923 μs (1 allocation: 8.12 KiB)
Savitzky-Golay filter from https://gist.github.com/jiahao/b8b5ac328c18b7ae8a17
  4.320 μs (1 allocation: 8.12 KiB)
1 Like

Take a look at the Types, Methods and Constructors pages in the Manual when you get the chance.

One option is to simply keep the default inner constructors. When an inner constructor is not defined explicitly by the programmer (such as your SavitzkyGolayFilter(::Int,::Int)), the compiler automatically generates some sensible constructor methods corresponding to the fields of the struct. What you do here is a matter of style.

See:

julia> struct SavitzkyGolayFilter{U,V,Func}
         basis::Matrix{U}
         proj_mat::Matrix{V}
         transform::Func
       end

julia> methods(SavitzkyGolayFilter)
# 1 method for type constructor:
 [1] SavitzkyGolayFilter(basis::Matrix{U}, proj_mat::Matrix{V}, transform::Func) where {U, V, Func}
     @ REPL[1]:2

The result of methods(SavitzkyGolayFilter) says that there’s a single method (type signature, basically) available for calling on SavitzkyGolayFilter. The method will create an object whose type parameters will match the types of the arguments. Example of how it works:

julia> SavitzkyGolayFilter([1 2; 3 4], Float64[1 2; 3 4], sqrt)
SavitzkyGolayFilter{Int64, Float64, typeof(sqrt)}([1 2; 3 4], [1.0 2.0; 3.0 4.0], sqrt)

FTR that’s not the only generated constructor, though: it’s also possible to explicitly specify the type parameters, in which case the element type of input arguments will be converted if necessary to match the specified type parameters:

julia> methods(SavitzkyGolayFilter{<:Any, <:Any, <:Any})
# 1 method for type constructor:
 [1] SavitzkyGolayFilter(basis::Matrix{U}, proj_mat::Matrix{V}, transform::Func) where {U, V, Func}
     @ REPL[1]:2

julia> SavitzkyGolayFilter{Float64,Float64,typeof(sqrt)}([1 2; 3 4], [1 2; 3 4], sqrt)
SavitzkyGolayFilter{Float64, Float64, typeof(sqrt)}([1.0 2.0; 3.0 4.0], [1.0 2.0; 3.0 4.0], sqrt)

You can add other constructor methods at any time after defining the type, BTW.

1 Like

@nsajko Thank you again for your help. This is my current implementation of Savitzky-Golay filter:


struct SavitzkyGolayFilterK{T1 <: Int, T2 <: AbstractFloat} 
    window_size :: T1                   # window size must be an odd number
    M :: T1                             # window half-size must be an odd number
    degree :: T1                        # degree of the fitting polynomial
    deriv :: T1                         # order of the derivative to compute
    dx::T2                              # grid spacing
    C :: Vector{T2}                     # filter coefficients
    proj_matrix:: Matrix{T2}            # projection matrix
    function SavitzkyGolayFilterK(window_size::T1, degree::T1, deriv::T1, dx::T2 ) where {T1, T2}
        isodd(window_size) || throw(ArgumentError("window_size must be an odd number."))
        window_size >= degree + 1 || throw(ArgumentError("window_size must be >= degree + 1.")) 
        
        M = (window_size - 1) ÷ 2
        #Create Jacobian matrix
        J = zeros(T2,(2M+1, degree+1))
        for i=1:2M+1, j=1:degree+1
            J[i, j] = (i-M-1)^(j-1)
        end
        proj_mat = (J' * J) \ J'
        C = proj_mat[deriv+1,:] .* T2(factorial(deriv+1))   
        C = C ./ dx^deriv
        return new{T1,T2}(window_size, M, degree, deriv, dx, C, proj_mat)
    end
end

SavitzkyGolayFilterK(window_size, degree) = SavitzkyGolayFilterK(window_size, degree, 0, 1.0)
SavitzkyGolayFilterK(window_size, degree, deriv) = SavitzkyGolayFilterK(window_size, degree, deriv, 1.0)


function (sg_filter::SavitzkyGolayFilterK)(data::Vector{T}; mode = :wrap) where T              
    
    M = sg_filter.M
    if T != typeof(sg_filter.C[1])
        C = T.(sg_filter.C)
    else
        C = @view sg_filter.C[:]
    end
   
    n = size(data, 1)  
    smoothed = C[M+1] .* data

    if mode == :wrap
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[i-j+n])
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[i+j-n])   
            end
        end
    elseif mode == :reflect
            @inbounds for i in eachindex(smoothed)
                for j=1:M
                    smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1-(i-j)])
                    smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n-(i+j)])   
                end
            end
    elseif mode == :constant
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1])
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n])   
            end
        end

    elseif mode == :pad
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : 0)
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : 0)  
            end
        end
    end
    return smoothed
end


I call it by defining the structure first, with e.g.

signal = rand(1024)
window_length = 17
M = 8
N = 3
deriv = 1
SG_filterK = SavitzkyGolayFilterK(window_length, N, deriv)

I benchmarked my code against other implementations

Savitzky-Golay filter form SavitzkyGolay.jl
  21.558 μs (66 allocations: 112.09 KiB)
Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand
  21.101 μs (63 allocations: 103.94 KiB)
my Savitzky-Golay filter
  16.775 μs (1 allocation: 8.12 KiB)
Savitzky-Golay filter of jiahao
  9.506 μs (1 allocation: 8.12 KiB)

I’m a bit disappointed that my implementation is 2 times slower than jiahao’s, however I managed to modify his code to behave better at the edges


struct SavitzkyGolayFilter{M,N,deriv, mode} end
@generated function (::SavitzkyGolayFilter{M,N,deriv,mode})(data::Vector{T}) where {M, N, deriv, mode, T}
    #Create Jacobian matrix
    J = zeros(T,(2M+1, N+1))
    for i=1:2M+1, j=1:N+1
        J[i, j] = (i-M-1)^(j-1)
    end
    proj_mat = (J' * J) \ J'
    C = proj_mat[deriv+1,:] .* T(factorial(deriv+1))         
    #Evaluate filter on data matrix
    
    expr = quote
        n = size(data, 1)
        # smoothed = zeros($To, n)
        smoothed = zeros(T, n)
        @inbounds for i in eachindex(smoothed)
            smoothed[i] += $(C[M+1])*data[i]
        end
        smoothed
    end

    for j=1:M

        if mode == :constant
            tmp_expr = :(if i - $j ≥ 1
                    smoothed[i] += $(C[M+1-j])*data[i-$j]
                else               
                    smoothed[i] += $(C[M+1-j])*data[1]
                end)
        elseif mode == :wrap
            tmp_expr = :(if i - $j ≥ 1
                    smoothed[i] += $(C[M+1-j])*data[i-$j]
                else               
                    smoothed[i] += $(C[M+1-j])*data[i-$j+n]
                end)
        elseif mode == :reflect
            tmp_expr = :(if i - $j ≥ 1
                    smoothed[i] += $(C[M+1-j])*data[i-$j]
                else               
                    smoothed[i] += $(C[M+1-j])*data[1-(i-$j)]
                end)
            elseif mode == :pad
            tmp_expr = :(if i - $j ≥ 1
                smoothed[i] += $(C[M+1-j])*data[i-$j]
                        end)        
        end
        insert!(expr.args[6].args[3].args[2].args, 1, tmp_expr)
       
                            
        if mode == :constant
            tmp_expr = :(if i + $j ≤ n
                    smoothed[i] += $(C[M+1+j])*data[i+$j]
                else               
                    smoothed[i] += $(C[M+1+j])*data[n]
                end)

        elseif mode == :reflect
                tmp_expr = :(if i + $j ≤ n
                        smoothed[i] += $(C[M+1+j])*data[i+$j]
                    else               
                        smoothed[i] += $(C[M+1+j])*data[2*n-(i+$j)]
                    end)
        elseif mode == :wrap
            tmp_expr = :(if i + $j ≤ n
                    smoothed[i] += $(C[M+1+j])*data[i+$j]
                else               
                    smoothed[i] += $(C[M+1+j])*data[i+$j-n]
                end)
        else
            tmp_expr = :(if i + $j ≤ n
                smoothed[i] += $(C[M+1+j])*data[i+$j]
                        end)        
        end
        push!(expr.args[6].args[3].args[2].args,tmp_expr)
        
    end

    return expr
end

There are no subtypes of Int or Float64. You may as well code those directly in the struct fields, or pick whichever of these is appropriate for your case:

julia> supertypes(Float64)
(Float64, AbstractFloat, Real, Number, Any)

julia> supertypes(Int)
(Int64, Signed, Integer, Real, Number, Any)
1 Like

Thank you, updated code:


struct SavitzkyGolayFilterK{T1 <: Signed, T2 <: AbstractFloat} 
    window_size :: T1                   # window size must be an odd number
    M :: T1                             # window half-size must be an odd number
    degree :: T1                        # degree of the fitting polynomial
    deriv :: T1                         # order of the derivative to compute
    dx::T2                              # grid spacing
    C :: Vector{T2}                     # filter coefficients
    proj_matrix:: Matrix{T2}            # projection matrix
    function SavitzkyGolayFilterK(window_size::T1, degree::T1, deriv::T1, dx::T2 ) where {T1, T2}
        isodd(window_size) || throw(ArgumentError("window_size must be an odd number."))
        window_size >= degree + 1 || throw(ArgumentError("window_size must be >= degree + 1.")) 
        
        M = (window_size - 1) ÷ 2
        #Create Jacobian matrix
        J = zeros(T2,(2M+1, degree+1))
        for i=1:2M+1, j=1:degree+1
            J[i, j] = (i-M-1)^(j-1)
        end
        proj_mat = (J' * J) \ J'
        C = proj_mat[deriv+1,:] .* T2(factorial(deriv+1))   
        C = C ./ dx^deriv
        return new{T1,T2}(window_size, M, degree, deriv, dx, C, proj_mat)
    end
end

SavitzkyGolayFilterK(window_size, degree) = SavitzkyGolayFilterK(window_size, degree, 0, 1.0)
SavitzkyGolayFilterK(window_size, degree, deriv) = SavitzkyGolayFilterK(window_size, degree, deriv, 1.0)

function (sg_filter::SavitzkyGolayFilterK)(data::Vector{T}; mode = :wrap) where T              
    
    M = sg_filter.M
    if T != typeof(sg_filter.C[1])
        C = T.(sg_filter.C)
    else
        C = @view sg_filter.C[:]
    end
   
    n = size(data, 1)  
    smoothed = C[M+1] .* data

    if mode == :wrap
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[i-j+n])
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[i+j-n])   
            end
        end
    elseif mode == :reflect
            @inbounds for i in eachindex(smoothed)
                for j=1:M
                    smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1-(i-j)])
                    smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[2*n-(i+j)])   
                end
            end
    elseif mode == :constant
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1])
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n])   
            end
        end

    elseif mode == :pad
        @inbounds for i in eachindex(smoothed)
            for j=1:M
                smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : 0)
                smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : 0)  
            end
        end
    end
    return smoothed
end

Well, except the bottom type, the empty type union, which subtypes all types. E.g., Union{} <: Int. A nit pick, perhaps, but we better be clear with the newbies. The empty union has no instances, though.

1 Like

Almost always it’s better to do C ./= dx^deriv, to prevent the unnecessary allocation of C ./ dx^deriv. However, …

It should be better for performance to do proj_mat[deriv+1,:] .* T2(factorial(deriv+1)) ./ dx^deriv. See the Performance tips section More dots: Fuse vectorized operations.

Noted.

@nsajko, any insights on why jiahao’s code runs twice as fast as mine? What’s driving its speed? Interestingly, the SavitzkyGolayFilter structure doesn’t actually store any data; it’s just a tool that enables the syntax function(sg_filter:: SavitzkyGolayFilter)(data::Vector), allowing for convenient usage like sg_filter(data). However, when I attempted to tweak the script to pass certain key values from my structure, it encountered errors.

1 Like

It’s an interesting question but I don’t have time right now.

Function-like objects.

Construct a self-contained example (as minimal as you can manage), and open a new thread, asking the question there and providing the example.

I did some performance test depending on the input size (x-axis). The execution time [ns] is on the y-axis.
test_savitzky_golay

jiahao’s code performs the best for all considered sizes. My code is 2-1.2 times slower, depending on the input size.

1 Like

FWIW, I wrote a port of Jiahao’s code a while ago and added a LoopVectorization.jl on top of it (just a 1 line change), so if you’re itching for the implementation that’s benchmarked about 20x from SciPy’s savgol, I’d use that.

Haven’t updated in a while but it shouldn’t be too bad to copy/past some code.
Update: goddamned nerdnsnipe. You should be able to run on it Julia 1.10 now!

My benchmarks have it as

[ Info: Julia Float32 for 10_000_000
  3.164 ms (0 allocations: 0 bytes)
[ Info: Julia Float64 for 10_000_000
  6.691 ms (0 allocations: 0 bytes)

vs

f32 savgol (5,2) for 10_000_000 elements
69.3 ms per loop 
f64 savgol (5,2) for 10_000_000 elements
80.3 ms per loop

on the single core version with AVX512. I’ve turned on multithreading by default, so you’ll see higher variance in other circumstances.

1 Like