It seems that StatPlots doesn’t support making a heatmap of an all-numeric dataframe. Would that be a reasonable addition to the package?
At first, I would say that the heatmap plot fits better in PlotRecipes.jl instead of StatsRecipes.jl? I’ve asked @mkborregaard about it in this thread:
What would be the use case for this?
I’m not sure what kind of answer you expect, but the particular thing
I’m trying to do is display the pattern of missing values in a large set
of numeric data, which is in the form of a dataframe. It seems to me
the heatmap is the best kind of display for this (although in this use
case it would have only two levels).
Ah, I see. That would involve converting the DataFrame to a Matrix of Float64, replacing the NAs with NaN. For a Matrix, your specific use case would actually better fit spy
better than heatmap
. A dataframe_spy recipe in StatPlots that ran isna
and returned the resulting Matrix{Bool} wrapped in a Plots.Spy object of the result would be a nice addition I think.
Ok, thanks, I found the spy function in Plot/recipes/. Before I posted
my question, I had thought of converting the dataframe to a matrix.
The only drawback I saw was that I would lose the column labels of the
dataframe, which would be useful in understanding the pattern, be it
NAs/NaNs or non-0 elements.
Good point - the names could be used as xticks in the recipe. The conversion could be as simple as
co, ro = Int[], Int[]
for j in 1:ncols(df)
for i in 1:nrows(df)
if isna(df[i,j])
push!(co, j)
push!(ro, i)
end
end
end
sparse(ro, co, true)
I just saw a post by you on StackExchange on the same subject, suggesting the use of ‘convert’:
Perfect! Happy to help. One note: If you have Null / NA values, you can give convert a third argument to replace them (as they are not allowed in an Array), e.g. convert(Array, iris[1:3], 0.).
– Michael K. Borregaard Mar 22 at 17:47
This seems easier, is there any reason not to use this solution?
However, convert(), in 0.5.1 at least, does not accept a 3d argument.
Because I thought it was faster to generate a sparse Boolean Matrix immediately, as that’s what spy needs. I just benchmarked, and it isn’t, in fact.
The three-argument conversion method only works for DataArrays, as the third argument always has a type (NaN is Float64), but you don’t expect different DataFrame columns to have the same type. So you have to convert the columns one by one:
ar = hcat([convert(Array, df[i], NaN) for i in 1:ncol(df)]...)
# then return the Spy object that will automatically dispatch to the spy recipe
Spy(ar)
Am I doing something wrong here?
julia> df
5×3 DataFrames.DataFrame
│ Row │ a │ b │ c │
├─────┼────┼────┼───┤
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 2 │
│ 3 │ NA │ NA │ 3 │
│ 4 │ 4 │ 4 │ 4 │
│ 5 │ 5 │ 5 │ 5 │
julia> convert(Array, df[1], NaN)
ERROR: InexactError()
in convert(::Type{Array{Int64,1}}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:391
in convert(::Type{Array}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:412
julia>
No, that’s just a really annoying aspect of the code. NaN are Float64, but the convert
function recognises the eltype
of df[:a]
as Int
, and the InexactError()
tells you you cannot convert NaN
to Int
. Sorry, when you said “all-numeric” I thought “all-floats”. The Bool method works though. Or you can do convert(Array, DataArray{Float64}(dd), NaN)
.
Ok, thanks for your help. My last problem is that I can’t eliminate the colorbar from the heatmap. Neither legend=false, nor colorbar=false, nor both do the trick.
Sounds like a bug. You could open an issue, specifying your backend and versioninfo