Ignoring missing data when plotting

Kryohi · August 27, 2018, 12:04pm

Kind of a newbie question, so if this isn’t the appropriate section I’ll move it.

I was trying to plot a column of a dataframe with some elements that cannot be calculated.

At first I thought of initialize them to 0, and then picking only values different from 0, but it’s kind of contrived, so I was wondering if Plots.jl wouldn’t just ignore y-values if they are of missing type.

As it turns out it can’t, since plotting an array of the type Array{Union{Float64,Missing}} gives TypeError: non-boolean (Missing) used in boolean context.

Am I missing something? I guessed this was the kind of use case for the new missing type… Of course it’s not a big problem, i can use findall or similar functions to trim unwanted values, but if it’s possible wouldn’t it make sense to be able to plot an array with missing elements?

mkborregaard · August 27, 2018, 12:17pm

Plots isn’t really Missings-compliant yet… The solution is to use StatPlots and plot directly from the Dataframe - that works. E.g.

using StatPlots, DataFrames
mydata = DataFrame(a = [1,missing,3], b = [2,3,4])
@df mydata scatter(:a, :b)

baggepinnen · August 27, 2018, 2:30pm

replace(my_array, missing=>NaN) maybe works?

Tamas_Papp · August 27, 2018, 3:16pm

PGFPlotsX handles missing (by emitting a nan for pgfplots):

using PGFPlotsX
@pgf Plot({only_marks}, Table(x = [0, 1, missing, 3], y = [0, missing, 2, 3]))

plot

mkborregaard · August 27, 2018, 3:42pm

Yes, replacing missing values with NaN works. The only reason that’s not just done internally is that Plots accepts many other input types apart from Float64

Kryohi · August 27, 2018, 5:48pm

Using NaNs did the trick, thanks everyone!

I will also try the other packages mentioned.

kevbonham · August 28, 2018, 2:24am

A bit of a hack that I use sometimes with Plots is to map !ismissing over your array, and then index with the result. Eg:

keep = map(!ismissing, my_ys)
plot(my_xs[keep], my_ys[keep])

mkborregaard · August 28, 2018, 6:14am

Then make sure you don’t have missings in my_xs!

piever · August 28, 2018, 7:38am

OTOH we could do the same as StatPlots @df does: replace by appropriate thing if we know how to replace (strong and float) error on a missing otherwise

mkborregaard · August 28, 2018, 7:43am

Yes, easily enough, on the input data - because we already copy them here Plots.jl/series.jl at 602dbdf1d260ef07daa2a2bdae558d82299e3d96 · JuliaPlots/Plots.jl · GitHub

But that wouldn’t work for any of the input passed as keyword arguments (the “attributes”). And frankly I think a cleaner approach is to provide first-class missings support in actually wrapping internal calls in Plots that may return missing in skipmissing where relevant. It’s just a bit more work.

stephancb · August 28, 2018, 10:49am

Plotly and Matlab, for example, discontinue the line of line/scatter plots when NaNs are present. This makes the fact of missing data visible without disturbing vizualisation of the good data too much. It can be useful, particularly when working with real (not simulated or modeled) data which can have some error condition flagged. Therefore, yes, it would be good if missing/nothing data could be handled within Plots taking this capability of the backend into account.

mkborregaard · August 28, 2018, 11:48am

Plots already handles NaN that way.

It’s just that it’s not straightforward to convert missing to NaN, since only Float64 vectors support NaN values. It could be done, as @piever said above, by replacing with NaN on Float64 input, “” on String input etc. But I’d say that given that missing is defined in Base, the cleanest approach is to propagate the missing values on to the backend plotting package and let that decide.

Maybe worth discussing in an issue.

https://github.com/JuliaPlots/Plots.jl/issues/1706

AlexisRenchon · November 27, 2019, 10:11pm

Plots ignore NaNs to plot lines, but when I used scatter and the attribute z = marker_z to color the dots, if my z has NaN elements, it is still going to plot a black dot.
One way to go around this is for x or y to be = NaN, but that doesn’t seem very elegant.

Is there a way to make NaN in marker_z to not plot the dot?

Topic		Replies	Views
Plotting options for DataFrames with NullableArray Visualization question	9	1591	January 20, 2017
How to connect gaps when Plotting Visualization question , plotting	2	492	September 21, 2020
How to handle missings in CairoMakie Visualization makie	4	519	May 21, 2021
Fixing missing marker in plot legend when first value in series is NaN General Usage bug , plots , potential-bug	2	663	September 6, 2022
Create histogram with missing values General Usage macros , dataframes , plots , recipe	6	2054	January 15, 2022

Ignoring missing data when plotting

Related topics