First, has anyone reimplemented Sinc neural networks in Julia? Those are for speaker recognition, state-of-the-art I believe, and should similarly help for speech recognition, and other audio work.
So far, I’m just looking at the sinc function itself, and I’m not sure the Julia implementation is as fast as it could be.
julia> function my_sinc(x) # this works (EDIT: for small domain), but now naive version
sum([((-1)^n*x^(2n))/factorial(2n + 1) for n=0:3])
end
julia> function my_sinc2(x) # I' looking into evalpoly, can I do something like this
evalpoly(x, (((-1)^n*x^(2n))/factorial(2n + 1) for n=0:3))
end
Anyone know if Padé approximant or some other way is best? I tried to google a bit, and didn’t see, but found:
You’re actually 1 week late. Julia very recently got much better functions for sinc and cosc (the derivative). If you check out master, you should see much better performance (and accuracy).
Explicitly calling factorial and computing powers of xseparately for each term in a Taylor series is not a good approach. Not only is this very slow (you have a lot of redundant computations!), but it can also overflow to 0*Inf if you compute enough terms.
See, for example, the new implementation of the cosc function in Base for an example of how to compute a Taylor series iteratively (to arbitrary precision), with specialized cases (evalpoly with hardcoded coefficients to a fixed order) for single and double precision.
About “the Julia implementation” (why I wrote “naive version” for mine), I was referring to the one in the standard library (and even your also now after seeing your new one).
Your new version seems 2x slower (except for the tiny region around 0, where you’re papering over the 2nd derivative issue, where yours is over 3x faster).
I was actually thinking of doing similar, with abs (or fastabs… as you do), but with a much wider region, avoiding division (and sin[pi] with range reduction, etc.).
got curious about the fastabs, and couldn’t not think of the classic “7 minutes abs”:
More off-topic (I had never heard of this Lush Lisp-variant language, or QMNIST):
Many software projects, particularly research projects, require two languages […]
The vector and matrix manipulation engine is quite powerful and efficient, making Lush ideal for heavy numerical applications and applications such as signal and image processing . […]
Lush offers an extensive library for gradient-based machine learning
The key difference between your version and the Base version is that your version only works for small x values. For example, my_sinc(10) is about -130 instead of the correct answer of 0. Making sinc fast and correct for the whole range pretty much requires calling sinpi which is going to be slow.
Yes, I know that. My question was really about exactly getting evalpoly to work (and if anyone actually working on Sinc [networks] already), and if Padé approximant appropriate. With the limit going to zero, a polynominal will never work without division, but it might for a large enough region, given a longer polynominal.
What are the options actually? I’m up to speed on the math of Taylor series (and sinc has one without division, if you’re ok with infinite…), missing some details on implementing with evalpoly. I’m reading up on minimax, and Padé etc. but what are the options I should be looking into? I think this (function approximation in general) will involve only polynominals, i.e. one or two (then division), of sufficient length.
Is this helpful to find the coefficients:
There I see e.g.:
linear_king_fit(E, U) , find coefficients a and b for E[i]^2 = a + b*U^0.5
linear_rational_fit(x, y, p, q) finds the coefficients for rational polynomials: y[i] = (a[1] + a[2]*x[i] + ... + a[p+1]*x[i]^p) / (1 + a[p+1+1]*x[i] + ... + a[p+1+q]*x[i]^q)
I.e. the latter seems helpful, not sure with the former. You do have sqrt in processors, not sure e.g. it used for function approximations, I think that and other options in the package are for other applications.
The best library for this is called Remez.jl it finds minimax polynomials or rational functions. That said, I’m not sure you’ll be able to find something accurate over the entire range that isn’t really slow.
I should have been more clear, e.g. with more coefficients, more accurate (it’s still WIP), I’m looking for the best approach, before converting to get fast, e.g. with evalpoly:
julia> function my_sinc(x) # this works for larger domain as more coefficients, still naive WIP version
sum([((-1)^n*x^(2n))/factorial(2n + 1) for n=0:7])
end
You mean the from -Inf to Inf as the “entire range”? I was thinking maybe -5 to 5 (just a guess), then fallback. But I’m actually not sure what the range needs to be, what’s needed for signal processing.
Sometimes zero-error point constraints are included in a Modified Remez Exchange Algorithm.[10]
I’m not sure this should be used, for few [closest to zero] extremes and/or zero-crossings. Nor how inaccurate the approximation is tolerable. For signal processing, I guess you want the frequency preserved, i.e. zero crossings, but e.g. the extremes could be off, assuming it would only be in the vertical, not horizontal direction.
In case people are interested, I’m thinking in context of:
Feb, 16 2019:
We replaced the old “sinc_conv” with “SincConv_fast”. The latter is 50% faster.
The code is not large, a few files (in top directory, each one smaller than this 304 sloc, I guess in total under 800-1000 lines):
I found there an .npy (raw) file, wasn’t sure what it is (seems to apply only to Python/Numpy, while there is NPZ.jl):
Why you should always save your data as .npy instead of .csv
There are some other applications (where sinc is the activation function, unlike above, where ReLU is used):
Funnily, name of the function comes from cardinal sine. The widest area of the graph resembles the cardinal with a hat.
[Picure: Assistant du Sinus Pape, membre du …]
Convolution with a sinc function is an entirely different beast — the limiting factor for convolution seems unlikely to be evaluation of sinc itself (if it is evaluated at all — e.g. an FFT-based method won’t evaluate sinc, whereas an approximate overlap-add method using a truncated sinc would precompute the filter coefficients anyway).
I see it used in the code, I’m not sure, are you saying there’s a better way, or this ok, just sinc will not be the speed critical part in the network (I know sinc is just used in the first layer):
for i in range(self.N_filt):
low_pass1 = 2*filt_beg_freq[i].float()*sinc(filt_beg_freq[i].float()*self.freq_scale,t_right)
low_pass2 = 2*filt_end_freq[i].float()*sinc(filt_end_freq[i].float()*self.freq_scale,t_right)
band_pass=(low_pass2-low_pass1)
band_pass=band_pass/torch.max(band_pass)
filters[i,:]=band_pass.cuda()*window
out=F.conv1d(x, filters.view(self.N_filt,1,self.Filt_dim))
From the paper (see top post):
The first layer of a standard CNN performs a set of time-
domain convolutions between the input waveform and some
Finite Impulse Response (FIR) filters […]
In standard CNNs,
all the L elements (taps) of each filter are learned from data.
Conversely, the proposed SincNet (depicted in Fig. 1) per-
forms the convolution with a predefined function g that de-
pends on few learnable parameters θ only
A rational approximation of the sinc function based on sampling and the Fourier transforms
Figure 6 shows the absolute difference between the original sinc function sinc (πν) and its rational approximation. Despite that the sinc function
is not easy to approximate [12, 16], only 2^(6−1) = 32 terms in the proposed
formula (16) provide absolute difference smaller than 3.2 × 10^−3 within the
range ν ∈ [−2π, 2π]. A MATLAB code validating the results based on the
rational approximation (16) of the sinc function sinc (πν) is provided in Ap-
pendix A.
Maybe I shouldn’t look at the code, with it tainted (so I couldn’t contribute to Julia)? But ok to look at formulas?
As we can see now, the truncated form of this limit can be implemented for
sampling in accordance with equation (1). Specifically, the sinc function can
be approximated as a cosine series expansion such that
FYI: the paper also seems to be on different things:
Fig. 2. The rectangular function approximations f(t) = 1/[(2t)^70 + 1] and
[I’ve never seen raised to 70th power, this high before, not sure if fast, a better way seems for a rectangular function…]
Voigt function [3], can be calculated by residues in a similar
way that we performed in our earlier work [2] to approximate the integral
(18). Due to rapid performance and high accuracy our algorithm [2] has
been implemented in current version of the atmospheric model bytran [6, 15]
Three digits of accuracy is fine for specialized applications, but the Base version needs to be accurate to close to machine precision. If your application only needs a crude sinc over a limited range, and the sinc is performance-critical, then by all means use a simple approximation.
In the code you quoted almost certainly the time is mostly spent in the conv1d function. The time spent calling sinc while precomputing filter coefficients should be negligible, especially in Julia.
Maybe you will find the branchless (normalized: zeros at integers) Sinc function approximations which ImageMagick has been using “forever” useful, e.g.,
Indeed. I should have pointed this out. (In that context we rarely need to step out of [-4, 4].) The branchlessness is w.r.t. division by 0 at the origin.
RE: the accuracy of the approximation: The >>max<< error bound is >>relative<<, not absolute, which means that it’s the number of >>significant<< (instead of total) digits. Also note that the linked source code includes more accurate approximations, up to max. abs. rel. error 1.2e-12 < 1/2^39. (All on [-4,4] for the normalized Sinc (zeros at integers). If you rarely call Sinc for values outside this range, you could branch and simply compute sin(pi * x)/(pi * x) outside of it. All the ImageMagick approximations are exactly 0 at x = ±4, so you know you’d be continuous.)
P.S. An maximum >>absolute<< error of, say, 1.e-3 is likely to lead to a much larger >>relative<< error near roots, and Sinc has many. Also, for a function that stays in the range [-1,1], the absolute error is generally smaller than the absolute error.
My package FindMinimaxPolynomial.jl can find minimax polynomials, too. It doesn’t know about Padé approximants. I used it to implement DebyeFunctions.jl.