I want to make a histogram with the data in a column of a data frame; the column has a lot of missing values. I don’t want to drop the missing values in the data frame or create a new array without missing values. How do I program a macro to create the histogram?
You can do so using skipmissing
:
using StatsPlots
x = [4, 5, 2, 7, missing, 8]
histogram(collect(skipmissing(x)))
But does any of the above comply with OP request of?
The dealing of missing's
are a real mystery to me. How are vectors and matrices containing them dealt with? We cannot have a matrix with missing nodes, however:
sizeof(missing)
0
which is obviously impossible. Even neutrinos have mass, and anything carrying information cannot be stored … and occupy no space.
Yes, maybe. But somewhere it must be stored the info of data memory chunks addresses because we cannot have down in the basement an array stored as
[8bytes, 8bytes, 0, 8bytes, …].
And confess that what puzzles me more is how an image with missings
can be stored as a 3D array in which some nodes occupy zero space.
Thanks
I guess sizeof doesn’t include the type tag. Since there is only one value of type Missing, there is no need to store any more information about it once we know the type.
The definition of Plots
recipes can help to handle histograms with missing values, especially the type recipes:
using DataFrames, Plots
df = DataFrame(a = [randn(99); missing; randn(99) .+ 2; missing; randn(99) .+ 4])
@recipe function f(v::Vector{Union{Missing, Float64}})
seriestype --> :hist
y = filter(!ismissing, v)
1:length(y), y
end
histogram(df.a) # it should now work for Vectors with missings