PyPlot versus GR scatter plot appearance

juliohm · December 20, 2016, 2:47pm

I have some million points to plot in 3D and would like to use the GR backend of Plots.jl to speedup exploration. The PyPlot backend gives me a desirable output:

but all the details are missing with GR:

Is there any luminance option that I can set in GR to get similar level of detail without compromising performance? It looks like the markersize option in Plots.jl is not passed to the GR backend?

dpsanders · December 20, 2016, 2:58pm

Please provide working example code.

juliohm · December 20, 2016, 3:35pm

I cannot provide the data, but below you can find the code with random points:

using Plots
# pyplot()
gr(format="png")

N = 1000000

x = randn(N)
y = randn(N)
z = sin.(x.*y)

scatter(x,y,z, markersize=.1, legend=false)

Notice I am using the PNG format in GR because it crashes with SVG when there are 1 million points in the cloud: https://github.com/jheinen/GR.jl/issues/51

dpsanders · December 20, 2016, 8:41pm

Is the solution contained in that issue or is there still a problem?

juliohm · December 20, 2016, 10:17pm

@dpsanders the issue is about the appearance in the illustration, it doesn’t relate to the issue on GitHub. I am just trying to have the GR plot look like the PyPlot one.

jheinen · December 21, 2016, 9:09am

We’d like to help you, but we need the original (or similar) data to reproduce the PyPlot result and see what happens. It’s unclear to me, why there is a luminance effect …

juliohm · December 21, 2016, 3:39pm

@jheinen I am not sure luminance is the right term, I am referring to the “realistic look” of the PyPlot surface compared to the one generated with GR. I can try to generate a fake dataset if it is really needed, what I am trying to say is that the cavities in the surface are clearly visible in the PyPlot. I cannot see the details of the surface in the GR plot. Perhaps it has to do with the default view angle and lighting?

juliohm · December 21, 2016, 3:48pm

@jheinen, I tried with a simple sinusoidal surface:

using Plots
# pyplot()
gr(format="png")

N = 500

xs = linspace(0,100,N)
points = zeros(N*N, 3)
i = 1
for x in xs, y in xs
    z = sin(x)+sin(y) + x/10
    points[i,:] = [x,y,z]
    i += 1
end

scatter(points[:,1], points[:,2], points[:,3], markersize=.1, zlims=(-10,10))

This time, I don’t like the PyPlot output. All the sine waves are overlapping with some transparency:

The GR plot works better:

So clearly, there is difference in the lighting/shadowing scheme of both backends. I would like to understand if there is a solution or any way around so that I get GR speed and surfaces with good detail.

dpsanders · December 21, 2016, 4:05pm

I don’t think you can expect a point cloud to behave nicely. What happens if you actually draw the surface instead?

dpsanders · December 21, 2016, 4:05pm

Also, have you tried PlotlyJS, which has interactivity?

juliohm · December 21, 2016, 7:16pm

Good point @dpsanders, I will PlotlyJS a try. Will also check if I can afford the surface plot instead of the scatter plot.

juliohm · December 21, 2016, 8:50pm

Surface plots present the same issue, PlotlyJS does not apply because I have many of these plots. I don’t know what is the root of the issue here, I just wanted GR to produce all the details that PyPlot produces when plotting these surfaces.

mkborregaard · December 21, 2016, 10:37pm

I think as a general rule you can expect the different backends to Plots to have different visual qualities and different attributes such as speed. Those differences are the reason that there is more than one backend. It is just a question of finding the right tradeoff. You can try to tweak some stuff, e.g. set markerstrokewidth = 0, you don’t need the markerstroke at that markersize.

jheinen · December 22, 2016, 7:26am

Plotting millions of points in 3D requires some pre-processing and/or re-binning. In principle, PyPlot and GR are doing the “same” thing, but you get different results due to different nominal marker sizes or due to the fact, that the PNG output is “smoothed”.

If your data is on a regular grid, you should consider using a surface representation (as already mentioned by @dpsanders).

This is a zoomed sub-view of the last example (N=500):

As you can see, for millions of points, such a representation would not make sense …

lwabeke · December 22, 2016, 7:49am

I think that is the point: You are trying to draw surfaces, but using point clouds.

Due to a number of factors: your point cloud sampling, the general trends in your data set and the axis view angle, your original data just happened to look like it had nice shading and highlighting (on the flat areas relative to the camera) on the plotly graph, however in reality it was just points where you happen to see inbetween the samples and seeing background on the flat areas, whereas on the steep areas the points appear closer together due to the view angle.

Think of another example: points on a sphere. With just points, you only end up with a circle of dots, there is no information to indicate whether it is flat, the top half, the bottom half, etc. You need extra information. You need to color code the points somehow (shades based on distance from viewport). But even then, you will see the front side and the back sides points. You need a surface to hide the stuff at the back.

Nevertheless it remains a nice coincidental effect you had with you first pyplot to make it appear as if other things are happening.

mkborregaard · December 22, 2016, 8:01am

I don’t think it is coincidental as such, but that you are right that it is an effect that occurs because of overlapping points. I think the difference between the first two is 1) pyplot() points are smaller, 2) pyplot() points are partly transparent, and 3) the striped patterns in the first GR plot has to do with the way the markerstroke around each point is plotted at this scale (msw = 0 really changes the look at least of the random ones)

juliohm · December 22, 2016, 11:27am

@jheinen unfortunately the grid is not regular, I tried using surfaces anyways st=:surface but the visualization didn’t improve much.

@mkborregaard I tried the markerstrokewidth=0 option, it removed some of the artifacts, but the result is still not that great with my dataset.

Perhaps I should mark this topic as solved, I will pick one of the answers, but thank you all for your help. I can proceed with PyPlot for the moment, it is not a big deal.

P.S.: How to mark the topic as solved in Discourse? Sometimes I see the checkbox near the answer and sometimes I don’t see it.

dpsanders · December 22, 2016, 1:49pm

You should also consider GLVisualize for plotting very large point clouds.

juliohm · December 22, 2016, 2:19pm

@dpsanders I would love too, it is pretty cool. It is not officially supported by Plots.jl yet, I will check soon it is released.

Palli · December 22, 2016, 3:50pm

What do you mean? This?:

I thought you could pre-process by throwing out random points, or say every tenth? Maybe that’s helpful for speed. Do you really need to see all the points?

I’m just following this thread, scanning to here, have no need or never used the plots; maybe this idea is bad…

Topic		Replies	Views
GR Plot issue? GR vs PyPlot, PlotlyJS New to Julia	9	1600	November 3, 2017
PyPlot Scatter-Plot Performance Performance plotting , pyplot	12	720	March 6, 2023
GR vs GLVisualize comparison? General Usage	19	3428	July 18, 2017
Dealing with large numbers of points Visualization	13	3134	November 22, 2018
Plots not working properly General Usage	1	767	March 2, 2021

PyPlot versus GR scatter plot appearance

Related topics