How to Determine or Approximate a Functions from Scatter Plots?

Hi all,

I am using this code to plot a lot of data and connect the scatter plot:

using Plots, LaTeXStrings, Plots.PlotMeasures

using Plots

x = [10/60, 20/60, 30/60, 40/60, 50/60, 60/60, 70/60, 80/60, 80/60, 90/60, 100/60, 
	110/60, 120/60, 130/60, 140/60, 150/60, 160/60, 170/60, 180/60, 190/60, 200/60, 210/60]
y = [0, 55, 57, 60, 70, 70, 70, 70, 19, 0, 59, 63, 65, 62, 0, 0, 0, 22, 38, 35, 25, 0]

plot(x,y; marker=(:circle,5), legend=:topright, label=" f(x)", 
	left_margin=5mm, size=(720, 360), tickfontsize=10)

I want to approximate the function that is connecting the dots/scatter plots. How to do this in Julia? Which package to use?

The function could be a polynomial or perhaps combination of transcendental functions, possibly… we never know…


There are many such functions. Uncountably many. And almost as many ways to interpolate a set of points. The choice of methods depends strongly on where the points come from.

A common choice is a linear or higher order spline interpolation as implemented for example here GitHub - JuliaMath/Interpolations.jl: Fast, continuous interpolation of discrete datasets in Julia
Maybe that’s already what you’re looking for?


How about this approximation?

using Statistics
ymean = mean(y)
plot!(x,fill(ymean,length(y)), label="Approx of f(x)")

I mean, you should certainly give some more requirements that such approximation should satisfy because the approximation using a constant function (it does not even have to be computed as a mean value) presented above does indeed satisfy all that you declared so far.


If you want to experiment on your own, the following code will create an approximation based on Least Squares:

# Insert your choice of basis functions in vector function below
ϕ(x) = [1,x,x^2,x^3,sin(x),cos(x),sin(x/2),cos(x/2)]
# Standard linear regression
Φ = ϕ.(x) |> x-> reduce(hcat,x) |> permutedims
b = Φ\y
# Approximating function
f(x) = sum(ϕ(x).*b)

To apply the code to your own data:


With this vector of basis functions (i.e., ϕ(x)), the resulting fit is not too impressive…

Some comments:

  1. Go ahead and experiment with your own function elements of ϕ(x) – I think you can use any valid Julia function – or your own function.
  2. It is possible that matrix Φ gets ill conditioned with many elements in the vector of basis functions, so that it gets difficult to compute b = Φ\y precisely. In that case, you can use SVD on Φ.
  3. The simple method above is nice for experimenting with the choice of basis functions, but a proper package will probably do the same and much more, such as giving you statistics of the approximation, etc.

Have a look at SymbolicRegression.

What do you want or need to do?

  • To interpolate the data piecewise.
  • To fit a spline.
  • To fit a function minimizing the residuals.

The third one, to fit a function minimizing the residual.

And know which function is it for the data that I have.

I use this, and it is great. I haven’t experimenting yet, but I will try another function for \phi (x)

For SVD I will learn it after I finish with Calculus, it should be in Linear Algebra section.

Then you have several options:

If you know the theoretical or desired shape of your function you provide that function parametrized and optimize those parameter to minimize the residual error. (common functions they are already implemented on many packages).

If not sure about the function then you can try something like a polynomial function.
Or use LOESS.
Or you can try with different functions sequentially trying to find the best one, but you need to be aware that you are doing multi-testing and you need to take it into account when interpreting the results.

You can also use a bayesian approach with Beta functions.

Or you can use decision trees or XGBoost. With proper cross-validation.

Or you can try with a black box: a neural network, deep learning… with proper cross-validation.

PS: Don’t forget to include interactions.

I’ll stick with polynomial first, I am still beginner in Mathematics, later will come soon.