Add band around point scatter

I have two datasets and I want to emphasize that one of them has smaller “noise level” than the other one. For example:

pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y)
Plots.scatter!(pts2_x, pts2_y)

image

To emphasize that the orange points have a smaller scatter around the parabola, I want to superpose a semi-transparent orange band around them, and a wider semi-transparent blue band around the blue points.

Note that in general, I do not know the law generating the points (i.e., that there is an underlying parabola). I just have two sets of points, one of which has a narrower distribution and I want to emphasize this.

Any suggestions on how to make this more visually obvious? Thanks!

Perhaps compute the complex convex hull and then apply some smoothing to make it look nicer.

1 Like

You mean “convex hull”? I guess I can use GitHub - JuliaPolyhedra/QHull.jl: A Julia wrapper around a PyCall wrapper around the qhull Convex Hull library to compute it. Any suggestions on how to smooth it?

In reality I was hoping there would be a simpler way, like a magical option in Plots :stuck_out_tongue:

If you have access to the mean function/ true underlying function, you can plot it and use the keyword ribbon in Plots.jl to draw the transparent bands you are looking for.
http://docs.juliaplots.org/latest/attributes/
See link above for attributes in Plots.jl

Another approach is to first estimate the underlying function from the points (how?), and then use ribbon.

To estimate the underlying function, I could do some moving average + interpolation. I have to think about this.

Haha yes!

It looks like a parabola, try

using Statistics
pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

A = pts1_x.^(0:2)'
k = A\pts1_y # Estimate linear model of order 2

yhat = A*k

I = sortperm(pts1_x)
plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y)
Plots.scatter!(pts2_x, pts2_y)
Plots.plot!(pts1_x[I], yhat[I], ribbon = 2std(yhat-pts1_y))

Hmm, it seems there’s some issue with the ribbon keyword, the following produces a nice plot for me

pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

A = pts1_x.^(0:2)'
k = A\pts1_y # Estimate linear model of order 2

yhat = A*k

I = sortperm(pts1_x)
plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y, color=:orange)
Plots.scatter!(pts2_x, pts2_y, color=:blue)
Plots.plot!(pts1_x[I], yhat[I], fillrange = yhat[I].+2std(yhat-pts1_y), alpha=0.2, color=:orange)
Plots.plot!(pts1_x[I], yhat[I], fillrange = yhat[I].-2std(yhat-pts1_y), alpha=0.2, color=:orange)

The parabola was just an example. The real data is more complicated.

Then a heavily lowpass-filtered signal might work better. Check DSP.jl for filtering options.

FWIW I think the scatter by itself is pretty clear. When you say that one has a narrower distribution, you mean a narrower distribution around a varying local mean, right? One way to do accomplish your goal would be to bin your data points in the x dimension, and construct the band based on the quantiles in those bins.

The parabolic scatter is just an example. I have more complicated datasets that would benefit from better visualization.