Happy to do so. First, let me thank everyone for their thoughtful responses to my query. I’m new to Julia and so a welcoming community is important to me. I come from 20+ years of working with Mathematica, some work with R, and am trying to diversify my language portfolio.
Let me tell you the context in which my request arose.
I do a lot of research on the Affordable Care Act. I have a dataframe in which we have columns such as (1) the year, (2) the geographic rating area, (3) the age of a person living in that geographic rating area, (4) the income of a person living in that geographic rating area and (5) the percentage of that person’s income they would need to contribute in order to purchase the second cheapest “Silver” plan sold in that geographic rating area.
I want to produce a graphic that basically compares the distribution of Column 5 among different values of Column 1. For what it’s worth, I particularly want to compare the distribution in 2014 and the distribution in 2017. I want to do so for various age-income subsets of the population. An example would be people who are 60 years old and whose income is 4.5 times the federal poverty level.
I’m now going to show you the graphic as Mathematica produces it. I do so not because I think that Julia has to replicate every feature of Mathematica but to suggest that some serious people think what I want to do is legitimate. (I’ll also show the code just because I think it’s interesting to see a lot of similarities between Julia Plots and Mathematica.)
Histogram[
Query[Select[#age == 60 && #fpl == 4.5 &] /* GroupBy[#year &],
All, #"contribution_pct" &][df], Automatic, "Probability",
ChartLegends -> Automatic,
Frame -> True,
FrameLabel -> {"gross premium\nas fraction of income",
"fraction of rating areas"},
PlotLabel -> "Fay: A 60 year old earning 450% FPL",
PlotTheme -> "Detailed"]
The problem with using absolute counts on the y-axis is that there were fewer plans sold in 2017 than in 2014. Therefore, in my opinion, a graphic that uses absolute counts would confuse the extent to which there has been a rightwards shift in the distribution in question between 2014 and 2017.
I also don’t think normalization=true will be particularly communicative to my audience in that the y-axis values will not have great meaning to them. The values on the y-axis will be things like 15 and 20 because the x-domain is small. Those values don’t really have any clear meaning in this context to economists/policy makers.
What I do think makes sense is the graphic above. One can read it to see that about 42% of rating areas in 2014 required the 60 year old in question to pay about 13% of their income to purchase the second lowest silver plan and that in about 10% of the rating areas in 2017 that same 60 year old is required to pay about 23% of their income to do the same thing. Regardless of what one may think about the politics of it all, those numbers communicate the point I am trying to make.
To generalize, the situation in which I believe a sum of the bars = 1 for each distribution would make sense is one in which one is comparing two discrete distributions, particularly ones that have different sized domains and particularly ones in which the number of values from which each distribution is derived differ. Also, a traditional bar graph may not work because the x-axis should be numeric, not categorical. (Maybe there is some option to the bar graph plotting routines that I did not notice??)
As I said, I am new to Julia and there may well already be a way of doing what I want. If, not, though, it seems (to me) a sensible thing to desire. And, given my admiration for the Plots package and its cousins, that would seem a helpful place in which to put the functionality.
Thanks.