Add band around point scatter

plotting

#1

I have two datasets and I want to emphasize that one of them has smaller “noise level” than the other one. For example:

pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y)
Plots.scatter!(pts2_x, pts2_y)

image

To emphasize that the orange points have a smaller scatter around the parabola, I want to superpose a semi-transparent orange band around them, and a wider semi-transparent blue band around the blue points.

Note that in general, I do not know the law generating the points (i.e., that there is an underlying parabola). I just have two sets of points, one of which has a narrower distribution and I want to emphasize this.

Any suggestions on how to make this more visually obvious? Thanks!


#2

Perhaps compute the complex convex hull and then apply some smoothing to make it look nicer.


#3

You mean “convex hull”? I guess I can use https://github.com/JuliaPolyhedra/QHull.jl to compute it. Any suggestions on how to smooth it?

In reality I was hoping there would be a simpler way, like a magical option in Plots :stuck_out_tongue:


#4

If you have access to the mean function/ true underlying function, you can plot it and use the keyword ribbon in Plots.jl to draw the transparent bands you are looking for.
http://docs.juliaplots.org/latest/attributes/
See link above for attributes in Plots.jl


#5

Another approach is to first estimate the underlying function from the points (how?), and then use ribbon.

To estimate the underlying function, I could do some moving average + interpolation. I have to think about this.


#6

Haha yes!


#7

It looks like a parabola, try

using Statistics
pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

A = pts1_x.^(0:2)'
k = A\pts1_y # Estimate linear model of order 2

yhat = A*k

I = sortperm(pts1_x)
plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y)
Plots.scatter!(pts2_x, pts2_y)
Plots.plot!(pts1_x[I], yhat[I], ribbon = 2std(yhat-pts1_y))

#8

Hmm, it seems there’s some issue with the ribbon keyword, the following produces a nice plot for me

pts1_x = [10randn() for i = 1 : 100]
pts1_y = pts1_x.^2 .+ [100randn() for i = 1:100]
pts2_x = [10randn() for i = 1 : 100]
pts2_y = pts2_x.^2 .+ [25randn() for i = 1:100]

A = pts1_x.^(0:2)'
k = A\pts1_y # Estimate linear model of order 2

yhat = A*k

I = sortperm(pts1_x)
plt = Plots.plot()
Plots.scatter!(pts1_x, pts1_y, color=:orange)
Plots.scatter!(pts2_x, pts2_y, color=:blue)
Plots.plot!(pts1_x[I], yhat[I], fillrange = yhat[I].+2std(yhat-pts1_y), alpha=0.2, color=:orange)
Plots.plot!(pts1_x[I], yhat[I], fillrange = yhat[I].-2std(yhat-pts1_y), alpha=0.2, color=:orange)

#9

The parabola was just an example. The real data is more complicated.


#10

Then a heavily lowpass-filtered signal might work better. Check DSP.jl for filtering options.


#11

FWIW I think the scatter by itself is pretty clear. When you say that one has a narrower distribution, you mean a narrower distribution around a varying local mean, right? One way to do accomplish your goal would be to bin your data points in the x dimension, and construct the band based on the quantiles in those bins.


#12

The parabolic scatter is just an example. I have more complicated datasets that would benefit from better visualization.