Forcing data points containing one sinusoidal signal to be symetric around zero and multithreading

ellocco · September 12, 2022, 2:01pm

Starting point are real world measurements containing one unique sinusoidal signal.
For signal processing purposes it might be beneficial to reduce the offset around zero before evaluating the signal either with fft() or with a parameter fit.
This was the reason to write a function, that eliminates / reduces the offset point by point, by means of a moving window.
The function works fine, but the time needed for processing is quite high, if the sampling rate goes up.
Now I wonder, if a similar function already exists that does the job in a more efficient manner, e.g. by spreading tasks on several cores.

In case others are looking for something similar or have suggestions, I put my example code below:

using PlotlyJS, Printf
using RobustModels
using Debugger

b_plt           = false
b_trunc_signal  = false
num_periods     = 5  
sampling_rate   = 100000
frequ1          = .1
ampl1           = 2.1
phase1          = 10.0 # in degree
offset_sp       = 0.1
grad_offset     = 0.0001 # gradient of the offset
# ---
ppp_frequ1    = trunc(Int, sampling_rate / frequ1)
window_length = trunc(Int, num_periods * ppp_frequ1) 

# --- time and data vector:
vec_t    = collect(range(0, step = 1.0 / sampling_rate, length = window_length)) 
td_data_ = offset_sp .+ (grad_offset * ampl1) .* vec_t .+ ( ampl1 .* cos.( (2 * pi * frequ1 .* vec_t) .+ deg2rad(phase1)) )


# --- Functions ---------------------------------------------------------------------------------------------
function _MyLibCutOffset(_signal::Vector{<:Number}, _sizeMV::Int, _b_truncSignal::Bool=true, _b_verbose::Bool=true)
    b_test_without_mean_command = true
    b_DBG = true
    _signal = Float64.(_signal) 
    _n_pts_signal = length(_signal)
    if _n_pts_signal <= _sizeMV
        error("Signal too short!")
    end
    # max_periods = trunc(Int, _n_pts_signal / _sizeMV) 
    pts_MV_correted = _n_pts_signal - _sizeMV
    if _b_truncSignal
        signal_symetric = Vector{Float64}(undef, pts_MV_correted)
        idx_shift = 0
    else
        signal_symetric = Vector{Float64}(undef, _n_pts_signal)
        if b_test_without_mean_command
            i_mean = 0.0
        else
            i_mean = RobustModels.mean(_signal[1:_sizeMV])
        end
        signal_symetric[1:floor(Int,_sizeMV/2)] = _signal[1:floor(Int,_sizeMV/2)] .- i_mean
        idx_shift = floor(Int,_sizeMV/2)
    end
    # ---
    if _b_verbose
        @info(string("MyLibCutOffset || _b_truncSignal: ", _b_truncSignal, ",\t_n_pts_signal: ", _n_pts_signal, ",\t_sizeMV: ", _sizeMV, ",\tpts_MV_correted: ", pts_MV_correted, ",\tidx_shift: ", idx_shift))
    end
    for i_pts = 1:pts_MV_correted
        _range = range(i_pts, length = _sizeMV)
        try
            if b_test_without_mean_command
                i_mean = 0.0
            else
                i_mean = RobustModels.mean(_signal[_range])
            end
        catch e
            println(string("i_pts: ", i_pts,", \ti_pts + idx_shift: ", i_pts + idx_shift, ", _range[end]: ", _range[end], ", length(_range): ", length(_range)))
        end
        signal_symetric[i_pts + idx_shift] = _signal[i_pts + floor(Int,_sizeMV/2)] - i_mean
    end
    # --- DBG ---
    if b_DBG
        println(string("pts_MV_correted - ceil(Int,_sizeMV/2): ", pts_MV_correted - ceil(Int,_sizeMV/2)))
    end
    if ~_b_truncSignal
        diff_last_corr = signal_symetric[pts_MV_correted + idx_shift] - _signal[pts_MV_correted + floor(Int,_sizeMV/2)]
        signal_symetric[pts_MV_correted+idx_shift+1:end] = _signal[pts_MV_correted+idx_shift+1:end] .- (i_mean + diff_last_corr)
        if b_DBG
            println(string("pts_MV_correted - ceil(Int,_sizeMV/2)+1: ", pts_MV_correted - ceil(Int,_sizeMV/2)+1,
            ", idx_shift+floor(Int,_sizeMV/2): ", idx_shift+floor(Int,_sizeMV/2)))

            println(string("diff_last_corr: ", diff_last_corr))
        end
    end
    if b_test_without_mean_command
        if _b_truncSignal
            idx_start = ceil(Int, pts_MV_correted/2)
            idx_end = _n_pts_signal - floor(Int, pts_MV_correted/2)
            signal_symetric = _signal[idx_start : idx_end]
        else
            signal_symetric = _signal
        end
    end
    # ---
    return signal_symetric
end

# --- Main --------------------------------------------------------------------------------------------------
signal_symetric_ = _MyLibCutOffset(td_data_, ppp_frequ1, b_trunc_signal)
n_pts_signal = length(signal_symetric_)
vec_t_symetric = vec_t[floor(Int, (window_length - n_pts_signal)/2) + 1 : window_length-ceil(Int, (window_length - n_pts_signal)/2)]

# --- info --------------------------------------------------------------------------------------------------
@info(string("window_length: ", window_length, ", n_pts_signal: ", n_pts_signal, ", vec_t_symetric: ", length(vec_t_symetric)))
@info(string("Mean input: ", RobustModels.mean(td_data_), ", Mean output: ", RobustModels.mean(signal_symetric_)))
# --- Plot --------------------------------------------------------------------------------------------------
if b_plt
    if window_length > 1e6 
        error("Too many points!")
    end
    PlotlyJS.PlotlyJSDisplay()
    hdl1 = PlotlyJS.scatter(; x = vec_t,            y = td_data_,           name = "raw")
    hdl2 = PlotlyJS.scatter(; x = vec_t_symetric,   y = signal_symetric_,   name = "corr")
    hdl_plt = PlotlyJS.Plot([hdl1, hdl2])
end

Krastanov · September 12, 2022, 2:44pm

First thing to check is if you are allocating a lot of temporary arrays. Does @time or @btime or @benchmark report a lot of allocations? (I would suggest the latter two as they avoid common pitfalls like reporting compilation time) This would be the first thing to optimize, as it is a good way to detect silly small temporary objects that significantly slow you down (e.g. it might be ok to have a small number of big initial allocations, but if the number of allocations scales with the size of your problem, removing the would probably drastically speed up your code).

Any reason you do a moving window instead of one single run of: input_array .-= mean(input_array)?

You have a lot of debugging branches in your code. Removing them from your MVE would make it easier for us to discuss the specifics of your code (at least for me that is the case, as it is easier to say what is important and what is inconsequential in your code).

ellocco · September 12, 2022, 2:52pm

Hi,
thank for your comment! In the real world it could be that the offset is not constant over time. This is especially an issue, if you would like to use parameter fit techniques. One example are investigation on batteries.
I have not much experience with benchmarking and performance tuning, what I have already checked is, if the two variants of mean: RobustModels.mean() and Statistics.mean() make a difference.

The try-catch-end structure is for sure a time critical element, I am still in the debugging phase.
Before spending more effort in performace aspects I wonder, if sommething similar might already exist,
it should be e.g. possible to start in parallel more than one for-loops.

Krastanov · September 12, 2022, 3:14pm

It seems you just need some high-pass filter. There are probably packages that do this better than what I am writing below, but here are two simple options:

An exponential filter around would be good enough:

function my_exp_high_pass_filter!(signal,decay)
    low_pass = 0
    @inbounds for i in eachindex(signal)
        low_pass = decay*low_pass + (1-decay)*signal[i]
        signal[i] -= low_pass
    end
    signal
end

Or if you want to do mean by segments (and this one can be multithreaded easily). Note that this is not really a moving window solution, as there is no overlap between windows.

function mean_by_segment(signal, segment_length)
    total_length = length(signal)
    @threads for i in 1:segment_length:total_length
        segment = @view signal[i:min(i+segment_length,total_length)]
        segment .-= mean(segment)
    end
end

Benchmarks

On four threads.

julia> r = rand(1_000_000);

julia> @benchmark my_exp_high_pass_filter!($r, 0.99)
BenchmarkTools.Trial: 1654 samples with 1 evaluation.
 Range (min … max):  2.767 ms …   4.495 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.956 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.018 ms ± 227.938 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ███▆▆▆▆▇▅▄▄▄▄▃▃▂ ▁▁                                     ▁
  █▄▇███████████████████████▇█▇▇▇▆▄▇▆▆▆▅▄▅▄▆▄▇▆▄▆▆▄▅▁▇█▄▇█▆▅▅ █
  2.77 ms      Histogram: log(frequency) by time      3.94 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  257.369 μs …   4.383 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     371.026 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   423.487 μs ± 181.874 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▂█▆▃▂                                                      
  ▂▄▃▅██████▅▅▄▃▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  257 μs           Histogram: frequency by time         1.16 ms <

 Memory estimate: 1.98 KiB, allocs estimate: 21.

ellocco · September 12, 2022, 3:27pm

Thanks for your snippets I am not sure about the option of a lowpass-filter, I have to check this idea.

Topic		Replies	Views
Fitting Sinusoidal Signal with LsqFit General Usage curve-fitting	5	1451	August 16, 2022
Interpolation/smoothing/decimating of unevenly spaced data Numerics interpolations , signal-processing	12	728	December 18, 2022
Help speeding up FFT processing code Performance	9	944	March 18, 2021
Sinc Interpolation based on FFT Signal and Image Processing fftw , interpolations , dsp	92	7719	January 24, 2022
Default sin / cos functions do not scale on multithreading Performance parallel , multithreading	19	1502	July 13, 2018

Forcing data points containing one sinusoidal signal to be symetric around zero and multithreading

Benchmarks

Related topics