# Savitzky Golay filter

Hey, I’m new to Julia, transitioning from the Python community. I attempted to implement a Savitzky-Golay filter in Julia, aiming to precompute certain parts of the code akin to Python’s class with `__init__()` method. While the code works, upon checking its execution time, I found it wasn’t any faster than the version that performs matrix inversion every time it’s executed. It seems my efforts were futile. Could you assist me in understanding what I might have done wrong? Here’s my code:

``````struct SavitzkyGolayFilter
basis::Matrix{<:Real}
proj_mat::Matrix{<:Real}
transform::Function

function SavitzkyGolayFilter(M::Int,N::Int)
x = -M:M
basis_matrix = zeros(2M+1, N+1)
for j = 1:N+1
basis_matrix[:, j] = x .^ (j-1)
end
proj_mat = (basis_matrix'basis_matrix) \ basis_matrix'
for j = 1:N+1
proj_mat[j,:].*= factorial(j-1)
end

function transform(signal::Vector{<:Real}; deriv=0)
filter_coeffs = reverse(proj_mat[deriv+1,:])
return conv(filter_coeffs, signal)[M+1:end-M]
end

return new(basis_matrix,  proj_mat, transform)
end

end
``````

Take a look at the Performance tips page in the Manual: Performance Tips · The Julia Language

The Performance tips have a section titled Avoid containers with abstract type parameters.

The Performance tips have a section titled Avoid fields with abstract type.

You could fix this, for example, by parameterizing `SavitzkyGolayFilter`, something like this:

``````struct SavitzkyGolayFilter{U,V,Func}
basis::Matrix{U}
proj_mat::Matrix{V}
transform::Func
...
end
``````
2 Likes

@nsajko Thank you for your comments. I’m relatively new to Julia, so I’m still navigating its intricacies. I recall reading that one of Julia’s strengths is its high performance even without explicitly specifying input types. While I’m not entirely clear on the concept of structures, I found that Julia offered a similar behavior to what I’m accustomed to in Python, which prompted me to use it. However, I’m uncertain about how to specify input types when constructing it, especially since I’m passing two integers while the elements of the structure consist of two matrices and a function.

Regarding the initial concern about performance, I’ve simplified the problem to a few steps and compared my results with available S-G implementations, as demonstrated below:

``````using SavitzkyGolay
using DSP: conv
using BenchmarkTools

signal = rand(1024)

M = 9
N = 3
deriv = 1

smoothed = savitzky_golay(signal, M, N; deriv=deriv).y
println("Savitzky-Golay filter form SavitzkyGolay.jl")
@btime smoothed = savitzky_golay(signal, M, N; deriv=deriv).y

sgfilter = SGolay(M, N, deriv)
sgfilter(signal).y
println("Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand")
@btime sgfilter(signal).y

m = M÷2
x = -m:m
basis_matrix = zeros(M, N+1)
for j = 1:N+1
basis_matrix[:, j] = x .^ (j-1)
end
proj_mat = (basis_matrix'basis_matrix) \ basis_matrix'
for j = 1:N+1
proj_mat[j,:].*= factorial(j-1)
end

filter_coeffs = reverse(proj_mat[deriv+1,:])

smoothed = conv(signal, filter_coeffs )[m+1:end-m]
println("Savitzky-Golay filter with a projection matrix builded beforehand, using conv from DSP.jl")
@btime smoothed = conv(signal, filter_coeffs )[m+1:end-m]

function conv_1d(signal::Vector{T}, kernel::Vector{T}) where T
output = zeros(T, length(signal))

m = (length(kernel) + 1) ÷ 2
for i in eachindex(signal)
for j in eachindex(kernel)
if (i - j + m ≥ 1)
if (i - j + m ≤ length(signal))
output[i] += signal[i - j + m] * kernel[j]
end
end
end
end
return output
end

smoothed = conv_1d(signal, filter_coeffs )
println("Savitzky-Golay filter with a projection matrix builded beforehand, using conv from a naive implementation")
@btime smoothed = conv_1d(signal, filter_coeffs );

struct SavitzkyGolayFilter{M,N,deriv} end
@generated function (::SavitzkyGolayFilter{M,N,deriv})(data::AbstractVector{T}) where {M, N, deriv, T}
#Create Jacobian matrix
J = zeros(2M+1, N+1)
for i=1:2M+1, j=1:N+1
J[i, j] = (i-M-1)^(j-1)/factorial(j-1)
end
e₁ = zeros(N+1)
e₁[deriv+1] = 1.0

#Compute filter coefficients
C = J' \ e₁

#Evaluate filter on data matrix

To = typeof(C[1] * one(T)) #Calculate type of output
expr = quote
n = size(data, 1)
smoothed = zeros(\$To, n)
@inbounds for i in eachindex(smoothed)
smoothed[i] += \$(C[M+1])*data[i]
end
smoothed
end

for j=1:M
insert!(expr.args[6].args[3].args[2].args, 1,
:(if i - \$j ≥ 1
smoothed[i] += \$(C[M+1-j])*data[i-\$j]
end)
)
push!(expr.args[6].args[3].args[2].args,
:(if i + \$j ≤ n
smoothed[i] += \$(C[M+1+j])*data[i+\$j]
end)
)
end

return expr
end

svg_filter = SavitzkyGolayFilter{m, N, deriv}()
smoothed = svg_filter(signal )
println("Savitzky-Golay filter from https://gist.github.com/jiahao/b8b5ac328c18b7ae8a17")
@btime smoothed = svg_filter(signal );
``````

The performance I achieve with a straightforward convolution implementation is on par with the results from SavitzkyGolay.jl. Interestingly, the conv function from DSP.jl is roughly twice as slow, likely because it employs an FFT algorithm, which may be less efficient for shorter inputs. I was surprised to find that the S-G filter implementation by @jiahao (An implementation of the Savitzky-Golay filter using generated functions. Accompanies https://medium.com/@acidflask/smoothing-data-with-julia-s-generated-functions-c80e240e05f3 · GitHub) is four times faster than my naive implementation, despite both performing the same computations.

``````Savitzky-Golay filter form SavitzkyGolay.jl
18.645 μs (66 allocations: 107.91 KiB)
Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand
18.197 μs (63 allocations: 99.75 KiB)
Savitzky-Golay filter with a projection matrix builded beforehand, using conv from DSP.jl
31.858 μs (19 allocations: 18.97 KiB)
Savitzky-Golay filter with a projection matrix builded beforehand, using conv from a naive implementation
15.923 μs (1 allocation: 8.12 KiB)
Savitzky-Golay filter from https://gist.github.com/jiahao/b8b5ac328c18b7ae8a17
4.320 μs (1 allocation: 8.12 KiB)
``````
1 Like

Take a look at the Types, Methods and Constructors pages in the Manual when you get the chance.

One option is to simply keep the default inner constructors. When an inner constructor is not defined explicitly by the programmer (such as your `SavitzkyGolayFilter(::Int,::Int)`), the compiler automatically generates some sensible constructor methods corresponding to the fields of the `struct`. What you do here is a matter of style.

See:

``````julia> struct SavitzkyGolayFilter{U,V,Func}
basis::Matrix{U}
proj_mat::Matrix{V}
transform::Func
end

julia> methods(SavitzkyGolayFilter)
# 1 method for type constructor:
[1] SavitzkyGolayFilter(basis::Matrix{U}, proj_mat::Matrix{V}, transform::Func) where {U, V, Func}
@ REPL[1]:2
``````

The result of `methods(SavitzkyGolayFilter)` says that there’s a single method (type signature, basically) available for calling on `SavitzkyGolayFilter`. The method will create an object whose type parameters will match the types of the arguments. Example of how it works:

``````julia> SavitzkyGolayFilter([1 2; 3 4], Float64[1 2; 3 4], sqrt)
SavitzkyGolayFilter{Int64, Float64, typeof(sqrt)}([1 2; 3 4], [1.0 2.0; 3.0 4.0], sqrt)
``````

FTR that’s not the only generated constructor, though: it’s also possible to explicitly specify the type parameters, in which case the element type of input arguments will be converted if necessary to match the specified type parameters:

``````julia> methods(SavitzkyGolayFilter{<:Any, <:Any, <:Any})
# 1 method for type constructor:
[1] SavitzkyGolayFilter(basis::Matrix{U}, proj_mat::Matrix{V}, transform::Func) where {U, V, Func}
@ REPL[1]:2

julia> SavitzkyGolayFilter{Float64,Float64,typeof(sqrt)}([1 2; 3 4], [1 2; 3 4], sqrt)
SavitzkyGolayFilter{Float64, Float64, typeof(sqrt)}([1.0 2.0; 3.0 4.0], [1.0 2.0; 3.0 4.0], sqrt)
``````

You can add other constructor methods at any time after defining the type, BTW.

1 Like

@nsajko Thank you again for your help. This is my current implementation of Savitzky-Golay filter:

``````
struct SavitzkyGolayFilterK{T1 <: Int, T2 <: AbstractFloat}
window_size :: T1                   # window size must be an odd number
M :: T1                             # window half-size must be an odd number
degree :: T1                        # degree of the fitting polynomial
deriv :: T1                         # order of the derivative to compute
dx::T2                              # grid spacing
C :: Vector{T2}                     # filter coefficients
proj_matrix:: Matrix{T2}            # projection matrix
function SavitzkyGolayFilterK(window_size::T1, degree::T1, deriv::T1, dx::T2 ) where {T1, T2}
isodd(window_size) || throw(ArgumentError("window_size must be an odd number."))
window_size >= degree + 1 || throw(ArgumentError("window_size must be >= degree + 1."))

M = (window_size - 1) ÷ 2
#Create Jacobian matrix
J = zeros(T2,(2M+1, degree+1))
for i=1:2M+1, j=1:degree+1
J[i, j] = (i-M-1)^(j-1)
end
proj_mat = (J' * J) \ J'
C = proj_mat[deriv+1,:] .* T2(factorial(deriv+1))
C = C ./ dx^deriv
return new{T1,T2}(window_size, M, degree, deriv, dx, C, proj_mat)
end
end

SavitzkyGolayFilterK(window_size, degree) = SavitzkyGolayFilterK(window_size, degree, 0, 1.0)
SavitzkyGolayFilterK(window_size, degree, deriv) = SavitzkyGolayFilterK(window_size, degree, deriv, 1.0)

function (sg_filter::SavitzkyGolayFilterK)(data::Vector{T}; mode = :wrap) where T

M = sg_filter.M
if T != typeof(sg_filter.C[1])
C = T.(sg_filter.C)
else
C = @view sg_filter.C[:]
end

n = size(data, 1)
smoothed = C[M+1] .* data

if mode == :wrap
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[i-j+n])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[i+j-n])
end
end
elseif mode == :reflect
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1-(i-j)])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n-(i+j)])
end
end
elseif mode == :constant
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n])
end
end

@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : 0)
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : 0)
end
end
end
return smoothed
end

``````

I call it by defining the structure first, with e.g.

``````signal = rand(1024)
window_length = 17
M = 8
N = 3
deriv = 1
SG_filterK = SavitzkyGolayFilterK(window_length, N, deriv)

``````

I benchmarked my code against other implementations

``````Savitzky-Golay filter form SavitzkyGolay.jl
21.558 μs (66 allocations: 112.09 KiB)
Savitzky-Golay filter form SavitzkyGolay.jl with a constructor builded beforehand
21.101 μs (63 allocations: 103.94 KiB)
my Savitzky-Golay filter
16.775 μs (1 allocation: 8.12 KiB)
Savitzky-Golay filter of jiahao
9.506 μs (1 allocation: 8.12 KiB)
``````

I’m a bit disappointed that my implementation is 2 times slower than jiahao’s, however I managed to modify his code to behave better at the edges

``````
struct SavitzkyGolayFilter{M,N,deriv, mode} end
@generated function (::SavitzkyGolayFilter{M,N,deriv,mode})(data::Vector{T}) where {M, N, deriv, mode, T}
#Create Jacobian matrix
J = zeros(T,(2M+1, N+1))
for i=1:2M+1, j=1:N+1
J[i, j] = (i-M-1)^(j-1)
end
proj_mat = (J' * J) \ J'
C = proj_mat[deriv+1,:] .* T(factorial(deriv+1))
#Evaluate filter on data matrix

expr = quote
n = size(data, 1)
# smoothed = zeros(\$To, n)
smoothed = zeros(T, n)
@inbounds for i in eachindex(smoothed)
smoothed[i] += \$(C[M+1])*data[i]
end
smoothed
end

for j=1:M

if mode == :constant
tmp_expr = :(if i - \$j ≥ 1
smoothed[i] += \$(C[M+1-j])*data[i-\$j]
else
smoothed[i] += \$(C[M+1-j])*data[1]
end)
elseif mode == :wrap
tmp_expr = :(if i - \$j ≥ 1
smoothed[i] += \$(C[M+1-j])*data[i-\$j]
else
smoothed[i] += \$(C[M+1-j])*data[i-\$j+n]
end)
elseif mode == :reflect
tmp_expr = :(if i - \$j ≥ 1
smoothed[i] += \$(C[M+1-j])*data[i-\$j]
else
smoothed[i] += \$(C[M+1-j])*data[1-(i-\$j)]
end)
tmp_expr = :(if i - \$j ≥ 1
smoothed[i] += \$(C[M+1-j])*data[i-\$j]
end)
end
insert!(expr.args[6].args[3].args[2].args, 1, tmp_expr)

if mode == :constant
tmp_expr = :(if i + \$j ≤ n
smoothed[i] += \$(C[M+1+j])*data[i+\$j]
else
smoothed[i] += \$(C[M+1+j])*data[n]
end)

elseif mode == :reflect
tmp_expr = :(if i + \$j ≤ n
smoothed[i] += \$(C[M+1+j])*data[i+\$j]
else
smoothed[i] += \$(C[M+1+j])*data[2*n-(i+\$j)]
end)
elseif mode == :wrap
tmp_expr = :(if i + \$j ≤ n
smoothed[i] += \$(C[M+1+j])*data[i+\$j]
else
smoothed[i] += \$(C[M+1+j])*data[i+\$j-n]
end)
else
tmp_expr = :(if i + \$j ≤ n
smoothed[i] += \$(C[M+1+j])*data[i+\$j]
end)
end
push!(expr.args[6].args[3].args[2].args,tmp_expr)

end

return expr
end
``````

There are no subtypes of `Int` or `Float64`. You may as well code those directly in the struct fields, or pick whichever of these is appropriate for your case:

``````julia> supertypes(Float64)
(Float64, AbstractFloat, Real, Number, Any)

julia> supertypes(Int)
(Int64, Signed, Integer, Real, Number, Any)
``````
1 Like

Thank you, updated code:

``````
struct SavitzkyGolayFilterK{T1 <: Signed, T2 <: AbstractFloat}
window_size :: T1                   # window size must be an odd number
M :: T1                             # window half-size must be an odd number
degree :: T1                        # degree of the fitting polynomial
deriv :: T1                         # order of the derivative to compute
dx::T2                              # grid spacing
C :: Vector{T2}                     # filter coefficients
proj_matrix:: Matrix{T2}            # projection matrix
function SavitzkyGolayFilterK(window_size::T1, degree::T1, deriv::T1, dx::T2 ) where {T1, T2}
isodd(window_size) || throw(ArgumentError("window_size must be an odd number."))
window_size >= degree + 1 || throw(ArgumentError("window_size must be >= degree + 1."))

M = (window_size - 1) ÷ 2
#Create Jacobian matrix
J = zeros(T2,(2M+1, degree+1))
for i=1:2M+1, j=1:degree+1
J[i, j] = (i-M-1)^(j-1)
end
proj_mat = (J' * J) \ J'
C = proj_mat[deriv+1,:] .* T2(factorial(deriv+1))
C = C ./ dx^deriv
return new{T1,T2}(window_size, M, degree, deriv, dx, C, proj_mat)
end
end

SavitzkyGolayFilterK(window_size, degree) = SavitzkyGolayFilterK(window_size, degree, 0, 1.0)
SavitzkyGolayFilterK(window_size, degree, deriv) = SavitzkyGolayFilterK(window_size, degree, deriv, 1.0)

function (sg_filter::SavitzkyGolayFilterK)(data::Vector{T}; mode = :wrap) where T

M = sg_filter.M
if T != typeof(sg_filter.C[1])
C = T.(sg_filter.C)
else
C = @view sg_filter.C[:]
end

n = size(data, 1)
smoothed = C[M+1] .* data

if mode == :wrap
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[i-j+n])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[i+j-n])
end
end
elseif mode == :reflect
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1-(i-j)])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[2*n-(i+j)])
end
end
elseif mode == :constant
@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : data[1])
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : data[n])
end
end

@inbounds for i in eachindex(smoothed)
for j=1:M
smoothed[i] += (C[M+1-j]) * (i - j ≥ 1 ? data[i-j] : 0)
smoothed[i] += (C[M+1+j]) * (i + j ≤ n ? data[i+j] : 0)
end
end
end
return smoothed
end
``````

Well, except the bottom type, the empty type union, which subtypes all types. E.g., `Union{} <: Int`. A nit pick, perhaps, but we better be clear with the newbies. The empty union has no instances, though.

1 Like

Almost always it’s better to do `C ./= dx^deriv`, to prevent the unnecessary allocation of `C ./ dx^deriv`. However, …

It should be better for performance to do `proj_mat[deriv+1,:] .* T2(factorial(deriv+1)) ./ dx^deriv`. See the Performance tips section More dots: Fuse vectorized operations.

Noted.

@nsajko, any insights on why jiahao’s code runs twice as fast as mine? What’s driving its speed? Interestingly, the SavitzkyGolayFilter structure doesn’t actually store any data; it’s just a tool that enables the syntax `function(sg_filter:: SavitzkyGolayFilter)(data::Vector)`, allowing for convenient usage like `sg_filter(data)`. However, when I attempted to tweak the script to pass certain key values from my structure, it encountered errors.

1 Like

It’s an interesting question but I don’t have time right now.

Construct a self-contained example (as minimal as you can manage), and open a new thread, asking the question there and providing the example.

I did some performance test depending on the input size (x-axis). The execution time [ns] is on the y-axis.

jiahao’s code performs the best for all considered sizes. My code is 2-1.2 times slower, depending on the input size.

1 Like

FWIW, I wrote a port of Jiahao’s code a while ago and added a `LoopVectorization.jl` on top of it (just a 1 line change), so if you’re itching for the implementation that’s benchmarked about 20x from SciPy’s `savgol`, I’d use that.

Haven’t updated in a while but it shouldn’t be too bad to copy/past some code.
Update: goddamned nerdnsnipe. You should be able to run on it Julia 1.10 now!

My benchmarks have it as

``````[ Info: Julia Float32 for 10_000_000
3.164 ms (0 allocations: 0 bytes)
[ Info: Julia Float64 for 10_000_000
6.691 ms (0 allocations: 0 bytes)
``````

vs

``````f32 savgol (5,2) for 10_000_000 elements
69.3 ms per loop
f64 savgol (5,2) for 10_000_000 elements
80.3 ms per loop
``````

on the single core version with AVX512. I’ve turned on multithreading by default, so you’ll see higher variance in other circumstances.

1 Like