Ignoring missing data when plotting

Kind of a newbie question, so if this isn’t the appropriate section I’ll move it.

I was trying to plot a column of a dataframe with some elements that cannot be calculated.

At first I thought of initialize them to 0, and then picking only values different from 0, but it’s kind of contrived, so I was wondering if Plots.jl wouldn’t just ignore y-values if they are of missing type.

As it turns out it can’t, since plotting an array of the type Array{Union{Float64,Missing}} gives TypeError: non-boolean (Missing) used in boolean context.

Am I missing something? I guessed this was the kind of use case for the new missing type… Of course it’s not a big problem, i can use findall or similar functions to trim unwanted values, but if it’s possible wouldn’t it make sense to be able to plot an array with missing elements?

Plots isn’t really Missings-compliant yet… The solution is to use StatPlots and plot directly from the Dataframe - that works. E.g.

using StatPlots, DataFrames
mydata = DataFrame(a = [1,missing,3], b = [2,3,4])
@df mydata scatter(:a, :b)
1 Like

replace(my_array, missing=>NaN) maybe works?

1 Like

PGFPlotsX handles missing (by emitting a nan for pgfplots):

using PGFPlotsX
@pgf Plot({only_marks}, Table(x = [0, 1, missing, 3], y = [0, missing, 2, 3]))


1 Like

Yes, replacing missing values with NaN works. The only reason that’s not just done internally is that Plots accepts many other input types apart from Float64

Using NaNs did the trick, thanks everyone!

I will also try the other packages mentioned.

A bit of a hack that I use sometimes with Plots is to map !ismissing over your array, and then index with the result. Eg:

keep = map(!ismissing, my_ys)
plot(my_xs[keep], my_ys[keep])

Then make sure you don’t have missings in my_xs! :slight_smile:

OTOH we could do the same as StatPlots @df does: replace by appropriate thing if we know how to replace (strong and float) error on a missing otherwise

Yes, easily enough, on the input data - because we already copy them here https://github.com/JuliaPlots/Plots.jl/blob/602dbdf1d260ef07daa2a2bdae558d82299e3d96/src/series.jl#L75-L88

But that wouldn’t work for any of the input passed as keyword arguments (the “attributes”). And frankly I think a cleaner approach is to provide first-class missings support in actually wrapping internal calls in Plots that may return missing in skipmissing where relevant. It’s just a bit more work.

1 Like

Plotly and Matlab, for example, discontinue the line of line/scatter plots when NaNs are present. This makes the fact of missing data visible without disturbing vizualisation of the good data too much. It can be useful, particularly when working with real (not simulated or modeled) data which can have some error condition flagged. Therefore, yes, it would be good if missing/nothing data could be handled within Plots taking this capability of the backend into account.

Plots already handles NaN that way.

It’s just that it’s not straightforward to convert missing to NaN, since only Float64 vectors support NaN values. It could be done, as @piever said above, by replacing with NaN on Float64 input, “” on String input etc. But I’d say that given that missing is defined in Base, the cleanest approach is to propagate the missing values on to the backend plotting package and let that decide.

Maybe worth discussing in an issue.

1 Like

Plots ignore NaNs to plot lines, but when I used scatter and the attribute z = marker_z to color the dots, if my z has NaN elements, it is still going to plot a black dot.
One way to go around this is for x or y to be = NaN, but that doesn’t seem very elegant.

Is there a way to make NaN in marker_z to not plot the dot?