I’m currently using GNU octave to fit a simple cosine with a phase to many data sets stored in text files. For most data sets, the fit works well, but for some others, it doesn’t. I’m wondering whether I’d be able to get better results with Julia. I’ll explain:
The fit function I’m using, as I guess almost any other curve fit function, requires an initial guess for the parameters, as in LsqFit.jl. In GNU octave, in order to guess the frequency good enough, I rely upon findpeaks to calculate the average distance between the maximum peaks. Here’s an example of 2 measurements which findpeaks hasn’t found successfully the peaks in one of them, but it did so good enough for the other:
There are a bit more details into the way I used findpeaks, but that’s irrelevant because this is a Julia Forum and not an octave forum.
Anyway, I’m no Julia user yet but I’ve heard good things about it, so I’m seeking out to hear about the experience of other users from the community, doing similar tasks to this. To be specific, findpeaks is my main disappointment out of GNU octave, and I wonder if somewhere in Julia’s community there’s a better findpeaks function.
Thank you @oheil for the code example and the welcome. But please note that I’m not looking for an answer to “how to do this”. I’ve also managed to write an equivalent code in Octave, and get a good approximation (also measured by hand) for the frequency but my fits are still disappointing.
What I mean is, that I’m interested in your experiences with Julia in the wild - where you - Julia users had to fit numerous data sets to the same function, with more parameters then in your ideal code example, perhaps for a function such as:
Where A_0 is an offset that also must be discovered by the curve fit.
Apologies if this isn’t quite what you’re looking for, but if what you are trying to do is fit data specifically to sinusoids, then you might want to use a routine like harminv. Unfortunately this is a C library and there is no pure-Julia version, though if you know a bit of C you should be able to easily make a wrapper.
To this specific issue of findpeaks, it seems like there are three options, following this conversation:
You could use an FFT to get an initial guess, but I would then pass it into a least-square fitting routine (or similar) to refine the guess.
If you know a priori that your data consists of a single cosine (plus noise?), then you can do better (possibly far better) than a Fourier transform (whose resolution is limited by the sampling time).
Thank you all for the links and suggestions. I guess I’ll have to find out by myself how better is Julia vs GNU octave for this task. I’ll probably have to try out all of the options you all linked iteratively for all of my data sets, and do the comparison by my self.
If anyone’s interested, here’s an example data file (more here):
In case of a single tone it is not limited as you can always pad by zeros.
Padding by zeros is almost a perfect interpolation of the case of a single Sine.
It won’t work, resolution wise, in case of multiple Sines / Cosines. As in this case you’re limited, in case of a bad SIR case, by the side lobes / width of main lobe which are governed by the time interval (The ratio between them isn’t, it is a function of the Window).
Regarding computational efficiency it is indeed sometimes better to have initial guess on a coarse DFT grid and then get better accuracy using Non Linear Least Squares methods.
it looks to me like your data is actually constant for a while, and then steps to a new value (there’s a flat blue line involved). This induces infinite curvature at each step location, and makes it look like every location is a peak. If this is just an artifact of the plot, nevermind. But if this is because you have a finite resolution like a 4 bit or even 8 bit A/D converter then you might benefit by converting to floating point representation, and then doing a convolution with a smoothing kernel to remove this infinite curvature. For example
using DSP
newdata = conv(olddata,[sin(pi*i/6) for i from 0:6])
FWIW, I think most of this is about picking the right algorithm, which you can do equally well in Julia and Octave
There is nothing inherently magical in Julia that will make it better for this task, aside from a general preference for programming in Julia (of applicable).
That’s an excellent observation, you have also correctly guessed that my sensor is of “low” resolution, which is correct. I read about convolution on Wikipedia and I didn’t get much how is that related to “smoothing” my data. However, thinking about it I realised smoothing is exactly what I need to do before I start looking for peaks, and I’ve found:
https://www.mathworks.com/help/matlab/data_analysis/convolution-filter-to-smooth-data.html Which uses a certain convolution function but I don’t know how to apply it to my data.
Yea I guess you are right, I was mostly disappointed by Octave’s findpeaks, but reading @dlakelan’s comment, I guess I can’t blame it. Smoothing my data with regdatasmooth before using findpeaks gives better results, but still not perfect for all of my data sets - false peaks of the smoothed data are still detected:
Writing a better algorithm that will make the call to findpeaks perfect, is the heart of my generic fit procedure. And it should probably take a long time, which I don’t currently have, though I do sense there’s some hope now, given the new knowledge I gained thanks to you all.
One thing that’s still missing for me when exploring Julia from above, is a dedicated equivalent to this smooth function which is available on Matlab / Octave:
If you are looking for Tikhonov regularization, I think that
has it implemented (and some other packages). And there are of course a ton of other smoothing algorithms implemented in Julia packages.
Incidentally, Matlab and Octave seem to be using different algorithms here. I am not sure that picking an arbitrary (if widely used) algorithm and calling it “smoothing” is a productive approach, there are zillion ways to “smooth” data, depending on the use case.
Basic knowledge of signal processing would have helped. Convolution with any non-negative kernel has smoothing qualities. There are tons of them; some have fancy names, some are claimed to be optimal under certain conditions. If you just choose one which is highest in the middle, symmetric, and strictly decreasing towards the edges, it will likely be good enough for your needs. Usually you would want to tune its width to your data though.
A convolution replaces each data point with a weighted average of its neighboring points. Richard Hammings book “Digital Filters” is an excellent introduction, and very inexpensive.
I just chose that as an example, but it’s called the Hann kernel and does reasonably well at hammering down high frequency noise, which round-off error is. Yes you might need to widen it. Hammings book will be invaluable.
@doronbehar the convolution of the derivative of a kernel is the same as the negative of the derivative of the convolution… So you could do:
N=10 ## changing this allows you to try out different widths
conv(data, [-cos(pi*i/N) for i from 0:N])
and then find where the resulting function changes sign, and it should get you a long way towards finding the peaks. You’ll see that the -cos() function forms a half cycle going from -1 to 1… so the convolution is a weighted sum of differences between function values at different separations in time. (f(x+a) - f(x-a)) kind of thing. Since you’re just looking for where the function crosses 0 because that’s where the “smooth derivative” goes to zero and hence a peak is… this should be enough.
Thanks again to you all for the links and suggestions. I conclude by now, that I’ll need to learn signal processing thoroughly at some point.
One last question: Does Julia return a vector of the same size when you use conv? On Octave, it doesn’t - if I use as a 2nd argument what I suspect is the matlab equivalent of:
[sin(pi*i/6) for i from 0:6]
I.e sin((0:6)*pi/6), I get a vector with a size larger by 6, and then I don’t know how to plot it or do anything else with it, because the time vector is of the same size as before.