Heatmap of a dataframe

It seems that StatPlots doesn’t support making a heatmap of an all-numeric dataframe. Would that be a reasonable addition to the package?

At first, I would say that the heatmap plot fits better in PlotRecipes.jl instead of StatsRecipes.jl? I’ve asked @mkborregaard about it in this thread:

What would be the use case for this?

I’m not sure what kind of answer you expect, but the particular thing
I’m trying to do is display the pattern of missing values in a large set
of numeric data, which is in the form of a dataframe. It seems to me
the heatmap is the best kind of display for this (although in this use
case it would have only two levels).

Ah, I see. That would involve converting the DataFrame to a Matrix of Float64, replacing the NAs with NaN. For a Matrix, your specific use case would actually better fit spy better than heatmap. A dataframe_spy recipe in StatPlots that ran isna and returned the resulting Matrix{Bool} wrapped in a Plots.Spy object of the result would be a nice addition I think.

1 Like

Ok, thanks, I found the spy function in Plot/recipes/. Before I posted
my question, I had thought of converting the dataframe to a matrix.
The only drawback I saw was that I would lose the column labels of the
dataframe, which would be useful in understanding the pattern, be it
NAs/NaNs or non-0 elements.

Good point - the names could be used as xticks in the recipe. The conversion could be as simple as

co, ro = Int[], Int[]
for j in 1:ncols(df)
   for i in 1:nrows(df)
      if isna(df[i,j])
         push!(co, j)
         push!(ro, i)
      end
   end
end
sparse(ro, co, true)

I just saw a post by you on StackExchange on the same subject, suggesting the use of ‘convert’:

Perfect! :slight_smile: Happy to help. One note: If you have Null / NA values, you can give convert a third argument to replace them (as they are not allowed in an Array), e.g. convert(Array, iris[1:3], 0.).
– Michael K. Borregaard Mar 22 at 17:47

This seems easier, is there any reason not to use this solution?
However, convert(), in 0.5.1 at least, does not accept a 3d argument.

Because I thought it was faster to generate a sparse Boolean Matrix immediately, as that’s what spy needs. I just benchmarked, and it isn’t, in fact.

The three-argument conversion method only works for DataArrays, as the third argument always has a type (NaN is Float64), but you don’t expect different DataFrame columns to have the same type. So you have to convert the columns one by one:

ar = hcat([convert(Array, df[i], NaN) for i in 1:ncol(df)]...)
# then return the Spy object that will automatically dispatch to the spy recipe
Spy(ar)

Am I doing something wrong here?

julia> df
5×3 DataFrames.DataFrame
│ Row │ a │ b │ c │
├─────┼────┼────┼───┤
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 2 │
│ 3 │ NA │ NA │ 3 │
│ 4 │ 4 │ 4 │ 4 │
│ 5 │ 5 │ 5 │ 5 │

julia> convert(Array, df[1], NaN)
ERROR: InexactError()
in convert(::Type{Array{Int64,1}}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:391
in convert(::Type{Array}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:412

julia>

No, that’s just a really annoying aspect of the code. NaN are Float64, but the convert function recognises the eltype of df[:a] as Int, and the InexactError() tells you you cannot convert NaN to Int. Sorry, when you said “all-numeric” I thought “all-floats”. The Bool method works though. Or you can do convert(Array, DataArray{Float64}(dd), NaN).

Ok, thanks for your help. My last problem is that I can’t eliminate the colorbar from the heatmap. Neither legend=false, nor colorbar=false, nor both do the trick.

Sounds like a bug. You could open an issue, specifying your backend and versioninfo