Sinc function - and sinc neural networks - and function approximation for the former

First, has anyone reimplemented Sinc neural networks in Julia? Those are for speaker recognition, state-of-the-art I believe, and should similarly help for speech recognition, and other audio work.

So far, I’m just looking at the sinc function itself, and I’m not sure the Julia implementation is as fast as it could be.

julia> function my_sinc(x)  # this works (EDIT: for small domain), but now naive version
         sum([((-1)^n*x^(2n))/factorial(2n + 1) for n=0:3])
       end

julia> function my_sinc2(x)  # I' looking into evalpoly, can I do something like this
         evalpoly(x, (((-1)^n*x^(2n))/factorial(2n + 1) for n=0:3))
       end

Anyone know if Padé approximant or some other way is best? I tried to google a bit, and didn’t see, but found:

https://www.researchgate.net/publication/229965145_Signal_sinc-interpolation_A_fast_computer_algorithm/link/5a4a7739458515f6b05b32a5/download

What’s currently done in Julia:

sinc(x::Real) = x==0 ? one(x) : isinf(x) ? zero(x) : sinpi(x)/(pi*x)

sinc(x::Number) = x==0 ? one(x)  : oftype(x,sinpi(x)/(pi*x))
sinc(x::Integer) = x==0 ? one(x) : zero(x)
sinc(x::Complex{<:AbstractFloat}) = x==0 ? one(x) : oftype(x, sinpi(x)/(pi*x))
sinc(x::Complex) = sinc(float(x))

EDIT (the network I have in mind; more details, in my own reply to this top post):

Mirco Ravanelli (he also has a youtube video on it), Yoshua Bengio,
“Speaker Recognition from raw waveform with SincNet”.
[1808.00158] Speaker Recognition from Raw Waveform with SincNet

1 Like

You’re actually 1 week late. Julia very recently got much better functions for sinc and cosc (the derivative). If you check out master, you should see much better performance (and accuracy).

5 Likes

Great, I see it here:
https://github.com/JuliaLang/julia/blob/7ecd1f9e0da5d07e7dd1733093b246a37bc118ff/base/special/trig.jl#L1080

I should have know, even with my 11 days old master outdated…

Makes this a bit outdated, while just an example, so not wrong:
https://github.com/JuliaLang/julia/blob/2fc3dcdaecc0f77fee56400ee80752cc81c8dfb0/doc/src/manual/types.md

Explicitly calling factorial and computing powers of x separately for each term in a Taylor series is not a good approach. Not only is this very slow (you have a lot of redundant computations!), but it can also overflow to 0*Inf if you compute enough terms.

See, for example, the new implementation of the cosc function in Base for an example of how to compute a Taylor series iteratively (to arbitrary precision), with specialized cases (evalpoly with hardcoded coefficients to a fixed order) for single and double precision.

3 Likes

About “the Julia implementation” (why I wrote “naive version” for mine), I was referring to the one in the standard library (and even your also now after seeing your new one).

Your new version seems 2x slower (except for the tiny region around 0, where you’re papering over the 2nd derivative issue, where yours is over 3x faster).

I was actually thinking of doing similar, with abs (or fastabs… as you do), but with a much wider region, avoiding division (and sin[pi] with range reduction, etc.).

got curious about the fastabs, and couldn’t not think of the classic “7 minutes abs”:

It was actually nothing (in my real case):

fastabs(x::Number) = abs(x)

fastabs(z::Complex) = abs(real(z)) + abs(imag(z))

More off-topic (I had never heard of this Lush Lisp-variant language, or QMNIST):

Many software projects, particularly research projects, require two languages […]
The vector and matrix manipulation engine is quite powerful and efficient, making Lush ideal for heavy numerical applications and applications such as signal and image processing . […]
Lush offers an extensive library for gradient-based machine learning

where I found it:

The key difference between your version and the Base version is that your version only works for small x values. For example, my_sinc(10) is about -130 instead of the correct answer of 0. Making sinc fast and correct for the whole range pretty much requires calling sinpi which is going to be slow.

3 Likes

Yes, I know that. My question was really about exactly getting evalpoly to work (and if anyone actually working on Sinc [networks] already), and if Padé approximant appropriate. With the limit going to zero, a polynominal will never work without division, but it might for a large enough region, given a longer polynominal.

What are the options actually? I’m up to speed on the math of Taylor series (and sinc has one without division, if you’re ok with infinite…), missing some details on implementing with evalpoly. I’m reading up on minimax, and Padé etc. but what are the options I should be looking into? I think this (function approximation in general) will involve only polynominals, i.e. one or two (then division), of sufficient length.

Is this helpful to find the coefficients:

There I see e.g.:

  • linear_king_fit(E, U) , find coefficients a and b for E[i]^2 = a + b*U^0.5
  • linear_rational_fit(x, y, p, q) finds the coefficients for rational polynomials: y[i] = (a[1] + a[2]*x[i] + ... + a[p+1]*x[i]^p) / (1 + a[p+1+1]*x[i] + ... + a[p+1+q]*x[i]^q)

I.e. the latter seems helpful, not sure with the former. You do have sqrt in processors, not sure e.g. it used for function approximations, I think that and other options in the package are for other applications.

The best library for this is called Remez.jl it finds minimax polynomials or rational functions. That said, I’m not sure you’ll be able to find something accurate over the entire range that isn’t really slow.

2 Likes

I should have been more clear, e.g. with more coefficients, more accurate (it’s still WIP), I’m looking for the best approach, before converting to get fast, e.g. with evalpoly:

julia> function my_sinc(x)  # this works for larger domain as more coefficients, still naive WIP version
                sum([((-1)^n*x^(2n))/factorial(2n + 1) for n=0:7])
              end

You mean the from -Inf to Inf as the “entire range”? I was thinking maybe -5 to 5 (just a guess), then fallback. But I’m actually not sure what the range needs to be, what’s needed for signal processing.

I do see at:

Sometimes zero-error point constraints are included in a Modified Remez Exchange Algorithm.[10]

I’m not sure this should be used, for few [closest to zero] extremes and/or zero-crossings. Nor how inaccurate the approximation is tolerable. For signal processing, I guess you want the frequency preserved, i.e. zero crossings, but e.g. the extremes could be off, assuming it would only be in the vertical, not horizontal direction.

In case people are interested, I’m thinking in context of:

Feb, 16 2019:

  • We replaced the old “sinc_conv” with “SincConv_fast”. The latter is 50% faster.

The code is not large, a few files (in top directory, each one smaller than this 304 sloc, I guess in total under 800-1000 lines):

I found there an .npy (raw) file, wasn’t sure what it is (seems to apply only to Python/Numpy, while there is NPZ.jl):

Why you should always save your data as .npy instead of .csv

There are some other applications (where sinc is the activation function, unlike above, where ReLU is used):

Funnily, name of the function comes from cardinal sine. The widest area of the graph resembles the cardinal with a hat.
[Picure: Assistant du Sinus Pape, membre du …]

1 Like

Convolution with a sinc function is an entirely different beast — the limiting factor for convolution seems unlikely to be evaluation of sinc itself (if it is evaluated at all — e.g. an FFT-based method won’t evaluate sinc, whereas an approximate overlap-add method using a truncated sinc would precompute the filter coefficients anyway).

3 Likes

I see it used in the code, I’m not sure, are you saying there’s a better way, or this ok, just sinc will not be the speed critical part in the network (I know sinc is just used in the first layer):

        for i in range(self.N_filt):
                        
            low_pass1 = 2*filt_beg_freq[i].float()*sinc(filt_beg_freq[i].float()*self.freq_scale,t_right)
            low_pass2 = 2*filt_end_freq[i].float()*sinc(filt_end_freq[i].float()*self.freq_scale,t_right)
            band_pass=(low_pass2-low_pass1)

            band_pass=band_pass/torch.max(band_pass)

            filters[i,:]=band_pass.cuda()*window

        out=F.conv1d(x, filters.view(self.N_filt,1,self.Filt_dim))

From the paper (see top post):

The first layer of a standard CNN performs a set of time-
domain convolutions between the input waveform and some
Finite Impulse Response (FIR) filters […]
In standard CNNs,
all the L elements (taps) of each filter are learned from data.
Conversely, the proposed SincNet (depicted in Fig. 1) per-
forms the convolution with a predefined function g that de-
pends on few learnable parameters θ only

In general I do trust Julia to have the best math approximations, but it seems sinc may be an exception (I saw talk of it shouldn’t be in Base), maybe not considered too important; or at least original sinc implementation implemented before this paper I found from last October: [1812.10884] A rational approximation of the sinc function based on sampling and the Fourier transforms

A rational approximation of the sinc function based on sampling and the Fourier transforms

Figure 6 shows the absolute difference between the original sinc function sinc (πν) and its rational approximation. Despite that the sinc function
is not easy to approximate [12, 16], only 2^(6−1) = 32 terms in the proposed
formula (16) provide absolute difference smaller than 3.2 × 10^−3 within the
range ν ∈ [−2π, 2π]. A MATLAB code validating the results based on the
rational approximation (16) of the sinc function sinc (πν) is provided in Ap-
pendix A.

Maybe I shouldn’t look at the code, with it tainted (so I couldn’t contribute to Julia)? But ok to look at formulas?

As we can see now, the truncated form of this limit can be implemented for
sampling in accordance with equation (1). Specifically, the sinc function can
be approximated as a cosine series expansion such that

FYI: the paper also seems to be on different things:

Fig. 2. The rectangular function approximations f(t) = 1/[(2t)^70 + 1] and

[I’ve never seen raised to 70th power, this high before, not sure if fast, a better way seems for a rectangular function…]

Voigt function [3], can be calculated by residues in a similar
way that we performed in our earlier work [2] to approximate the integral
(18). Due to rapid performance and high accuracy our algorithm [2] has
been implemented in current version of the atmospheric model bytran [6, 15]

Three digits of accuracy is fine for specialized applications, but the Base version needs to be accurate to close to machine precision. If your application only needs a crude sinc over a limited range, and the sinc is performance-critical, then by all means use a simple approximation.

In the code you quoted almost certainly the time is mostly spent in the conv1d function. The time spent calling sinc while precomputing filter coefficients should be negligible, especially in Julia.

3 Likes

Maybe you will find the branchless (normalized: zeros at integers) Sinc function approximations which ImageMagick has been using “forever” useful, e.g.,

const double xx = x*x;
/*
  Maximum absolute relative error 6.3e-6 < 1/2^17.
*/
const double c0 = 0.173610016489197553621906385078711564924e-2L;
const double c1 = -0.384186115075660162081071290162149315834e-3L;
const double c2 = 0.393684603287860108352720146121813443561e-4L;
const double c3 = -0.248947210682259168029030370205389323899e-5L;
const double c4 = 0.107791837839662283066379987646635416692e-6L;
const double c5 = -0.324874073895735800961260474028013982211e-8L;
const double c6 = 0.628155216606695311524920882748052490116e-10L;
const double c7 = -0.586110644039348333520104379959307242711e-12L;
const double p =
  c0+xx*(c1+xx*(c2+xx*(c3+xx*(c4+xx*(c5+xx*(c6+xx*c7))))));
return((xx-1.0)*(xx-4.0)*(xx-9.0)*(xx-16.0)*p);

Source: ImageMagick resize.c source code

2 Likes

Note that this approximation is only accurate (to ≈ 6 digits) for x \in [-4,+4], so it’s only “branchless” if you restrict your inputs to that interval.

2 Likes

Indeed. I should have pointed this out. (In that context we rarely need to step out of [-4, 4].) The branchlessness is w.r.t. division by 0 at the origin.
RE: the accuracy of the approximation: The >>max<< error bound is >>relative<<, not absolute, which means that it’s the number of >>significant<< (instead of total) digits. Also note that the linked source code includes more accurate approximations, up to max. abs. rel. error 1.2e-12 < 1/2^39. (All on [-4,4] for the normalized Sinc (zeros at integers). If you rarely call Sinc for values outside this range, you could branch and simply compute sin(pi * x)/(pi * x) outside of it. All the ImageMagick approximations are exactly 0 at x = ±4, so you know you’d be continuous.)
P.S. An maximum >>absolute<< error of, say, 1.e-3 is likely to lead to a much larger >>relative<< error near roots, and Sinc has many. Also, for a function that stays in the range [-1,1], the absolute error is generally smaller than the absolute error.

1 Like

My package FindMinimaxPolynomial.jl can find minimax polynomials, too. It doesn’t know about Padé approximants. I used it to implement DebyeFunctions.jl.