Heatmap of a dataframe


#1

It seems that StatPlots doesn’t support making a heatmap of an all-numeric dataframe. Would that be a reasonable addition to the package?


#2

At first, I would say that the heatmap plot fits better in PlotRecipes.jl instead of StatsRecipes.jl? I’ve asked @mkborregaard about it in this thread:


#3

What would be the use case for this?


#4

I’m not sure what kind of answer you expect, but the particular thing
I’m trying to do is display the pattern of missing values in a large set
of numeric data, which is in the form of a dataframe. It seems to me
the heatmap is the best kind of display for this (although in this use
case it would have only two levels).


#5

Ah, I see. That would involve converting the DataFrame to a Matrix of Float64, replacing the NAs with NaN. For a Matrix, your specific use case would actually better fit spy better than heatmap. A dataframe_spy recipe in StatPlots that ran isna and returned the resulting Matrix{Bool} wrapped in a Plots.Spy object of the result would be a nice addition I think.


#6

Ok, thanks, I found the spy function in Plot/recipes/. Before I posted
my question, I had thought of converting the dataframe to a matrix.
The only drawback I saw was that I would lose the column labels of the
dataframe, which would be useful in understanding the pattern, be it
NAs/NaNs or non-0 elements.


#7

Good point - the names could be used as xticks in the recipe. The conversion could be as simple as

co, ro = Int[], Int[]
for j in 1:ncols(df)
   for i in 1:nrows(df)
      if isna(df[i,j])
         push!(co, j)
         push!(ro, i)
      end
   end
end
sparse(ro, co, true)

#8

I just saw a post by you on StackExchange on the same subject, suggesting the use of ‘convert’:

Perfect! :slight_smile: Happy to help. One note: If you have Null / NA values, you can give convert a third argument to replace them (as they are not allowed in an Array), e.g. convert(Array, iris[1:3], 0.).
– Michael K. Borregaard Mar 22 at 17:47

This seems easier, is there any reason not to use this solution?
However, convert(), in 0.5.1 at least, does not accept a 3d argument.


#9

Because I thought it was faster to generate a sparse Boolean Matrix immediately, as that’s what spy needs. I just benchmarked, and it isn’t, in fact.

The three-argument conversion method only works for DataArrays, as the third argument always has a type (NaN is Float64), but you don’t expect different DataFrame columns to have the same type. So you have to convert the columns one by one:

ar = hcat([convert(Array, df[i], NaN) for i in 1:ncol(df)]...)
# then return the Spy object that will automatically dispatch to the spy recipe
Spy(ar)

#10

Am I doing something wrong here?

julia> df
5×3 DataFrames.DataFrame
│ Row │ a │ b │ c │
├─────┼────┼────┼───┤
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │ 2 │
│ 3 │ NA │ NA │ 3 │
│ 4 │ 4 │ 4 │ 4 │
│ 5 │ 5 │ 5 │ 5 │

julia> convert(Array, df[1], NaN)
ERROR: InexactError()
in convert(::Type{Array{Int64,1}}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:391
in convert(::Type{Array}, ::DataArrays.DataArray{Int64,1}, ::Float64) at /usr/home/ko/.julia/v0.5/DataArrays/src/dataarray.jl:412

julia>


#11

No, that’s just a really annoying aspect of the code. NaN are Float64, but the convert function recognises the eltype of df[:a] as Int, and the InexactError() tells you you cannot convert NaN to Int. Sorry, when you said “all-numeric” I thought “all-floats”. The Bool method works though. Or you can do convert(Array, DataArray{Float64}(dd), NaN).


#12

Ok, thanks for your help. My last problem is that I can’t eliminate the colorbar from the heatmap. Neither legend=false, nor colorbar=false, nor both do the trick.


#13

Sounds like a bug. You could open an issue, specifying your backend and versioninfo