Proportion plot in CairoMakie

I am trying to make a proportion plot, where the x-axis is the year (2000 - 2004) and the y-axis is from 0 - 1. Each bar has a height of 1 and should be stacked to show the proportion of the different health score categories (1 - 6) per year.

I have been using CairoMakie for other visualisations, but I canโ€™t find a way to make a proportion plot of my description. If there is a way using CairoMakie, then that would be preferable, but other packages are also fine.

Also, is there a way to derive the proportions column without so many steps?

using DataFrames

years = [2000, 2000, 2000, 2000, 2000, 
        2001, 2001, 2001, 2001, 2001, 2001,
        2002, 2002, 2002, 2002, 2002, 2002, 2002,
        2003, 2003, 2003, 2003, 2003, 
        2004, 2004, 2004, 2004, 2004, 2004]

health_scores = [3, 2, 3, 5, 3, 
                2, 5, 1, 6, 5, 6,
                3, 2, 3, 4, 5, 1, 1,
                3, 1, 4, 6, 2, 
                4, 3, 3, 2, 2, 3]

df = DataFrame(year = years, health = health_scores)

# Adding column for the number of observations in each year
df2 = combine(groupby(df, :year), :health, :year => length)

# Adding number of count per health score category per year
df3 = combine(groupby(df2, [:year, :health]), :health => length, :year_length)

# Adding the proportions of health score categories per year
proportions = df3.health_length./df3.year_length
df4 = hcat(df3, proportions)
rename!(df4, :x1 => :proportions)

> df4
29ร—5 DataFrame
 Row โ”‚ year   health  health_length  year_length  proportions 
     โ”‚ Int64  Int64   Int64          Int64        Float64
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚  2000       2              1            5     0.2
   2 โ”‚  2000       3              3            5     0.6
   3 โ”‚  2000       3              3            5     0.6
  โ‹ฎ  โ”‚   โ‹ฎ      โ‹ฎ           โ‹ฎ             โ‹ฎ            โ‹ฎ
  28 โ”‚  2004       3              3            6     0.5
  29 โ”‚  2004       4              1            6     0.166667

DataFrames has a function proprow to get the proportion of rows in each group. In your case, you want the proportion of each health group, for each year group, so you need to call groupby on each year group. Itโ€™s probably best to define a helper function for that:

using DataFrames
using CairoMakie

year = [2000, 2000, 2000, 2000, 2000, 
        2001, 2001, 2001, 2001, 2001, 2001,
        2002, 2002, 2002, 2002, 2002, 2002, 2002,
        2003, 2003, 2003, 2003, 2003, 
        2004, 2004, 2004, 2004, 2004, 2004]

health = [3, 2, 3, 5, 3, 
          2, 5, 1, 6, 5, 6,
          3, 2, 3, 4, 5, 1, 1,
          3, 1, 4, 6, 2, 
          4, 3, 3, 2, 2, 3]

df = DataFrame(; year, health)

proportions(tbl) = combine(groupby(DataFrame(tbl), :health), proprow => :proportion)

df2 = combine(groupby(df, :year), AsTable(:) => proportions => AsTable)

Result:

20ร—3 DataFrame
 Row โ”‚ year   health  proportion 
     โ”‚ Int64  Int64   Float64    
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚  2000       2    0.2
   2 โ”‚  2000       3    0.6
   3 โ”‚  2000       5    0.2
   4 โ”‚  2001       1    0.166667
...

(Instead of proprow you could also use proportions or proportionmap from StatsBase.jl)

For plotting the bars itโ€™s easy if you donโ€™t need a legend:

barplot(df2.year, df2.proportion, stack=df2.health, color=df2.health)

Getting the legend however seems complicated, based on an example from the documentation:

colors = Makie.wong_colors()
fig, = barplot(df2.year, df2.proportion, stack=df2.health, color=colors[df2.health])
labels = string.(1:maximum(df2.health))
elements = [PolyElement(polycolor=colors[i]) for i in 1:length(labels)]
Legend(fig[1,2], elements, labels, "Health")
fig

Maybe @jules or @sdanisch can say if thereโ€™s a simpler way to get the legend.

1 Like

Not really, yet. In the future weโ€™ll have a stricter categorical colormap type with which it will make sense to add legend overloads. Right now this is too intermingled with how continuous color works.

1 Like

In case you are interested about solutions from other packages. Here is a solution for Deneb.jl using your original df dataframe:

Data(df) * Mark(:bar) * Encoding(
    x=:year, 
    y=field("count(health)", stack=:normalize), 
    color=:health
)

image

See also this example in the documentation gallery.

1 Like

Thank you! These were very helpful!