I am trying to make a proportion plot, where the x-axis is the year (2000 - 2004) and the y-axis is from 0 - 1. Each bar has a height of 1 and should be stacked to show the proportion of the different health score categories (1 - 6) per year.
I have been using CairoMakie for other visualisations, but I canโt find a way to make a proportion plot of my description. If there is a way using CairoMakie, then that would be preferable, but other packages are also fine.
Also, is there a way to derive the proportions column without so many steps?
using DataFrames
years = [2000, 2000, 2000, 2000, 2000,
2001, 2001, 2001, 2001, 2001, 2001,
2002, 2002, 2002, 2002, 2002, 2002, 2002,
2003, 2003, 2003, 2003, 2003,
2004, 2004, 2004, 2004, 2004, 2004]
health_scores = [3, 2, 3, 5, 3,
2, 5, 1, 6, 5, 6,
3, 2, 3, 4, 5, 1, 1,
3, 1, 4, 6, 2,
4, 3, 3, 2, 2, 3]
df = DataFrame(year = years, health = health_scores)
# Adding column for the number of observations in each year
df2 = combine(groupby(df, :year), :health, :year => length)
# Adding number of count per health score category per year
df3 = combine(groupby(df2, [:year, :health]), :health => length, :year_length)
# Adding the proportions of health score categories per year
proportions = df3.health_length./df3.year_length
df4 = hcat(df3, proportions)
rename!(df4, :x1 => :proportions)
> df4
29ร5 DataFrame
Row โ year health health_length year_length proportions
โ Int64 Int64 Int64 Int64 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 2000 2 1 5 0.2
2 โ 2000 3 3 5 0.6
3 โ 2000 3 3 5 0.6
โฎ โ โฎ โฎ โฎ โฎ โฎ
28 โ 2004 3 3 6 0.5
29 โ 2004 4 1 6 0.166667