[Plots] Efficient scatter plot of a 2×N Array

When we want to plot a series of points we simply write:

points = rand(10000)
scatter(points)

this assumes a y = rand(10000) and x = 1:10000. From the Official documentation - Argument Passing Behavior

Julia function arguments follow a convention sometimes called “pass-by-sharing”, which means that values are not copied when they are passed to functions.

I know that in this case the variable points is not copied when passed to scatter.

Now consider having a 2×500 Array of points, i.e., 500 points in a 2D space. For this case, the scatter plot requires you to pass both x and y series separately. We usually do this by writing

scatter(points[1, :], points[2, :])

In this case points[1, :] and points[2, :] are copies of each row of points that are created before being passed to the function. A way to prevent this is

scatter(view(points, 1, :), view(points, 2, :))

but it is a bit verbose, is there a way to achieve the same behavior without calling view?

1 Like

This still calls view but is already less verbose:

@views scatter(points[1,:], points[2,:])
3 Likes

You can also write scatter(eachrow(points)...), which is the same views but a bit shorter.

5 Likes

Did you benchmark this?
I doubt that the copying would take any noticeable amount of time with a 2x500 array, compared the plotting (and I suspect if the array is actually large enough for this to matter, you want a 2d histogram rather than a scatter plot).

3 Likes

I used 500 for the sake of example, but yes in practice it might not be relevant. I asked out of curiosity and still managed to learn one or two things from the answers.

3 Likes

It is very likely that plotting time will also scale at least linearly or worse (when a render has to figure out overlap in case of overplotting) with the number of points.

Also, Performance Tips · The Julia Language .

The key to efficient Julia code is to be aware of various performance pitfalls and optimization techniques, then benchmark and profile, and apply them where needed.

1 Like

One of the first things Plots does is - copy the input. So copying manually first should introduce a constant but quite small time increase.

1 Like