Lowpass with Hann window

I’m not familiar with “signal processing” business. I’m from a different discipline. That means I’m not familiar with the jargon in Filters - filter design and filtering · DSP.jl .

I need to apply simple lowpass filters with the boxcar or Hann window: In a pseudo code:

    lowpassed[i] = sum_{i'} W[i - i'] f[i'] for each i.

where f[i] is the input signal and W[i] is the symmetric window function such that sum_{i} W[i] == 1. I often use the Hann window and sometimes the boxcar window.

It would of course be easy to write such filter functions but since there is a nice library, why not utilize it? But then, I’ve found it hard to understand the manual because it’s not written in the terminology familiar to me.

One easy way to convolve with a boxcar filter (i.e., calculate a k-sample moving sum), is to simply subtract two samples from the output of the cumsum function, k samples apart. Something like

julia> function boxcar(f, k); c = [0; cumsum(f)] ; c[1+k:end] - c[1:end-k] ; end
julia> boxcar([0,0,0,0,1,0,0,0,0], 3)
7-element Vector{Int64}:
 0
 0
 1
 1
 1
 0
 0

The standard answer (not limited to a box filter) is probably:

julia> conv([0,0,0,0,1,0,0,0,0], [1,1,1])
11-element Vector{Int64}:
 0
 0
 0
 0
 1
 1
 1
 0
 0
 0
 0

I agree that the documentation is currently too brief, and targeted at users who are already familiar with the DSP toolboxes of other programming languages. The docstring of a function such as conv should certainly come with a complete and self-contained definition of what it does, and not assume that the reader already knows what a convolution is. And the manual needs lots of practical examples.

The filt(b, a, x) function is another example of one where the docstring should really include a self-contained definition of what the function does. A 3-sample boxcar FIR filter:

julia> filt([1,1,1], [1], [0,0,0,0,1,0,0,0,0])
9-element Vector{Int64}:
 0
 0
 0
 0
 1
 1
 1
 0
 0

In contrast, the hanning function does actually come with a reasonable docstring, including a formula and a plot.

conv([0,0,0,0,1,0,0,0,0], [1,1,1])

It would be a standard answer when/if there is no library that provides various simple filters.

That is, I want to get the window function (weight function) from some library, instead of writing my own function. My other example, the Hann window, uses a vector which is different from the 3-point boxcar [1,1,1].

So, if somebody has written a package that provides, Hann, Hamming, Gaussian, boxcar, etc. windows, I’d use it with conv. But these windows are already included in the DSP package and I don’t want to re-invent a poorer wheel.

I suspect that the problem is not its brevity, but that it’s written for narrow, specific audience.

Because the DSP package already includes the Hann window:
https://docs.juliadsp.org/stable/windows/#DSP.Windows.hanning

it should be easy to utilize that to do something like this

conv(mydatavector, hannwindow(npoints=13))

Even better, I suspect there is a standard method to achieve what I want, within the DSP framework or under another package.

This approach is better than writing your own window function because you can easily switch to different, well-tested window functions.

Thanks. So, the problem is, I don’t know how you discover those functions and their doc strings. [Edit: Oh, do you mean that the entry in the manual I quoted above? Okay, I’ll try to figure out how to use it to generate the weight vector to be used with conv or filt.]

So, what does a mean? From your documentation, I understand that b is the weight vector.

Okay, thanks to @mgkuhn, I’ve found a solution:

using DSP
function lowpass(x)
  sz = size(pres,1)
  n = 49
  m = (n-1) ÷ 2
  shn = hanning(n) # including the zeros at the edges
  shn .= shn ./ sum(shn)
  res = conv(x, shn) # adds m points at each end
  return res[(m+1):(sz - m)]
end

I know I need to be careful about the interpretation of the first and last m points of the result.

One unsatisfactory aspect to this solution is to have to deal with the edges manually. I wish I could write

res = filt(hanning(49), x)

with one or another standardized handling of the edges.

I don’t understand the result from filt(shn, [1], f), which isn’t symmetric between the first and last edges.

filt(b, a, x) outputs a sequence constructed by weighing the last couple of input samples with the coefficients from b and then subtracts the last couple of output samples weighted with the coefficients from a, as I explain towards the end of this lecture video (slides 24–25).

This way, the filt function allows you to implement not only finite-impulse response (FIR) filters (which just do convolution with a finite vector b of coefficients), but also infinite-impulse response (IIR) filters, which cover filter behaviours such as exponential decay with (theoretically) infinitely-long memory.

One reason why the output of filt is not as symmetric as that of conv is that filt was designed such that it can be preloaded with internal state (e.g., via si or a staeful filter object), and can thus be called repeatedly, in a way that carries over internal state from one call to the next. If you want to carry over internal state, feed the output of e.g. hanning into FIRFilter, which constructs a stateful filter object that can be called multiple times in a streaming fashion, and remembers in the stateful filter object the last few of input elements from the previous call, for a seamless handover.

conv treats edges by assuming that your vectors are extended infinitely far with zeros on each side, and it only returns the parts of the theoretically infinitely long output vector that isn’t guaranteed to be zero thanks to that extension.

filt on the other hand only can apply causal filters, i.e. it can’t look into the future. With

y = filt(b, a, x)

y[i] is never affected by x[i+1], but only by x[i], x[i-1], etc.

The question is what kind of “handling of edges” do you want? If you want anything non-causal, I suspect filt is not the right function for you, and you better stick with conv.

Also note that the filter design functions, such as digitalfilter, are there to build low-pass, high-pass, bandpass, etc. filters, with given cut-off frequencies, and hanning is used there only to control the filter order and transition behaviour, i.e. to control how steep and rippling the transition band between letting a frequency through (pass band) and blocking it (stop band) is in the frequency domain. If you want to just “smooth” some signal in the time domain, without actually requiring any particular frequency-domain behaviour of your filter (have a particular signal shape in a spectrogram), again, I think just doing it manually with conv might be more suitable for you.

(Which leads me to my standard question: what’s the actual problem you want to solve?)

The question is what kind of “handling of edges” do you want?

I need to see what choices I have. On another application I use, moving averages are not defined at the points that require non-existent datapoints. For example, for a 3-point boxcar filter, the first and last points are undefined in the result of the filter because the first point requires one datapoint to the left and the last point requires one datapoint to the right.

This is one sensible choice. There are other sensible choices.

But, conv is highly unusual as a filter. A normal filter preserves the number of datapoints. For this reason, I don’t know which timestep each point in the output of conv corresponds to. The original data is a timeseries [(t1, x1), (t2, x2), . . . , (tN, xN)] but the output has more points than N.

I wrote my code I posted above, by just guessing what conv does.

I want to reduce the amplitude of ocean tides from my time series. Because my purpose is visualization (I plot graphs), the filter doesn’t have to be sophisticated at all. But the boxcar moving average has too much lobes in its frequency response. The Gaussian filter is perhaps ideal, but the Hann filter is close enough and it’s nice in that its width in the physical domain is finite.

Because I plot the data with time on the horizontal axis, I need to know which time each datapoint corresponds to.

Any way, I still need to figure out what conv does. None of your descriptions actually shows the math.

I tried to see what exactly is a vector convolution. I haven’t found the formula. I looked at the source code of conv but it calls another function, which calls another function, . . .

I google-searched. There is a nice Matlab webpage showing vector convolutions but it doesn’t show the math either. The writer of the page assumes that the reader knows what a vector convolution is.

Convolution between two 1-d functions in an infinite domain is well known, but for a finite domain, you need to make a choice.

conv treats edges by assuming that your vectors are extended infinitely far with zeros on each side, and it only returns the parts of the theoretically infinitely long output vector that isn’t guaranteed to be zero thanks to that extension.

So, I’ll try to convert your words to math. Suppose we have a continuous signal x(t) which is defined only in [0,T]. You are saying that we extend x(t) as

x(t) = 0 for t > T or t < 0

Right? Then, the convolution is

y(t) = integral_{-∞}^{∞} w(t - t’) x(t’) dt’
= integral_{0}^{T} w(t - t’) x(t’) dt’

where w(t) is symmetric and compact support ( w(t) = 0 for |t| > K ).

Now, I see that y(t) extends at least to t = −K and t = T + K.

That’s why the result of conv has tails on both sides and I can deduce the width of these extra datapoints.

Vectors are a bit harder to understand because the indices don’t extend to 0, -1, -2, . . . in the result of filtering.

Since you’re working in the discrete-time domain, the convolution is not an integral but a sum. The formula is here.

If you define both the input sequence and the filter response as starting at t = 0, then the output of the convolution also starts at t = 0. See the first figure here. The input sequence is x and the filter response is b. The output y is valid for all times; however, you will see a transient response until the filter taps “fill up” with input samples.

Thanks, that’s exactly what I was looking for, except that the range of n isn’t defined in the Wikipedia article:

(f * g)[n] = sum_{m = -M}^{m = M} f[n - m] g[m]

n can be any integer, positive or negative. The formula applies for arbitrary integers.

So, this alone isn’t sufficient to determine the implementation of conv.

If you define both the input sequence and the filter response as starting at t = 0 , then the output of the convolution also starts at t = 0 .

Great! Thanks. That settles it. That means that conv isn’t symmetric. (Note that the formula in the Wikipedia is symmetric in the negative and positive directions.) Its result is longer than the original signal timeseries because a tail is added at the end!

That’s what I needed to know to use conv correctly! So, my implementation I showed above was wrong. I should have chopped off only at the end. Then, the result is likely the same as the filt(window, [0], timeseries) solution.

By the way, I’ve been also wondering why the window functions aren’t normalized in DSP?

If you use DPS.Windows.hanning() with conv or filt as lowpass filter, the amplitude increases because it’s not normalized. But, a lowpass filter should leave a constant signal intact except near the edges:

julia> using DSP

julia> conv(ones(10), [1,1,1]) # amplitude increases!
12-element Vector{Float64}:
 1.0
 2.0
 3.0
 3.0
 3.0
 3.0
 3.0
 3.0
 3.0
 3.0
 2.0
 1.0

julia> conv(ones(10), [1,1,1]./3) # Window should be normalized!
12-element Vector{Float64}:
 0.33333333333333337
 0.6666666666666667
 1.0
 1.0
 0.9999999999999999
 1.0
 1.0
 1.0
 0.9999999999999999
 1.0
 0.6666666666666667
 0.3333333333333333

I’ve been also wondering why the window functions aren’t normalized in DSP

hanning() produces a “window function” and not a digital filter. It is normally not used to filter anything directly, but to crop the infinitely long impulse response of an ideal low-pass filter (such as some scaled version of sinc) to a finite-length approximation. A normal FIR low-pass filter impulse response is the sample-wise product of sinc and hanning output, as in e.g.

h = sinc.(-2:0.1:2) .* hanning(41)
h /= sum(h)

Yes; mathematically, the functions f[n] and g[n] in that formula are defined for all n. In practice, those signals have finite duration. The convolution is the sum of the product of one signal times a shift of the other, which is interesting only where the signals “overlap”, and this is what conv calculates.

And yes, in general a convolution is not symmetrical.

See this recent thread about the implementation of convolution: Optimizing Direct 1D Convolution Code

I’ve been testing my filters and I’ve found important error in what I said earlier.

So, by way of summary, I first write down the problem I want to solve, and then write two solutions, one a straightforward implementation of the solution as formulated and the other using conv().

Problem: You have a timeseries x[1], x[2], . . . x[N]. You want to apply a moving average:

y[i] = x[i-1]*w[1] + x[i]*w[2] + x[i+1]*w[3]

for i = 1, . . . , N. This example uses a 3-point moving average but you can easily generalize it. The above code uses non-existent datapoints x[0] and x[N+1], which are assume do be zero.

Solution using conv():

function movingaverage(x, w)
  sw = size(w,1)
  @assert isodd(sw)
  m = (sw - 1) ÷ 2 # sw = 2m + 1
  a = DSP.conv(x, w)
  @assert size(a,1) == size(x,1) + 2m
  return a[(m+1):(end-m)] # discard both ends.
end

I said that conv() isn’t symmetric, but it is. It includes y[0] and y[N+1] so that you need to chop them off.

I’ve verified that the above code is correct by comparing it with the following straightforward implementation:

function movingaverage2(x, w)
  sw = size(w,1)
  nx = size(x,1)
  @assert isodd(sw)
  m = (sw - 1) ÷ 2 # sw = 2m + 1
  res = similar(x)
  for i = 1:nx
    s = 0
    for j = -m:m
      xij = (1 <= i+j <= nx) : x[i+j] : 0 # 0 padding
      s = s + xij * w[j+m+1]
    end
    res[i] = s
  end
end

(I know this one is potentially much slower than the solution using conv() because the latter potentially uses FFT.)

Anyway, it seems that there is no julia package that offers common filters we use in our field:

hanning() produces a “window function” and not a digital filter. It is normally not used to filter anything directly,

It is a “window function” which is normally used to filter things directly, in our field.

hanning() produces a “window function” and not a digital filter.

But, I suppose there is no downside if it is normalized by default. You normalize the product

h /= sum(h)

after all.

I think it’s reasonable to expect the window functions to return values according to their mathematical definition. Maybe a boolean keyword normalized that defaults to false would be a good idea.