Hi there,
Consider the following data.csv file:
Class_Name Grade
Turma 6 4.2
Turma 4 3.5
Turma 6 0.2
Turma 3 Especial 1.6
Turma 2 Piloto 7.8
Turma 4 1.4
Turma 5 1.6
Turma 6 3.8000000000000003
Turma 6 1.5
Turma 6 5.800000000000001
Turma 6 7.8
Turma 2 Piloto 3.3
Turma 2 Piloto 0.8
Turma 3 Especial 0.0
Turma 6 8.9
Turma 4 3.0
Turma 2 Piloto 5.0
Turma 6 1.1
Turma 5 5.2
Turma 6 4.2
Turma 6 7.1
Turma 2 Piloto 5.7
Turma 2 Piloto 0.8
Turma 5 4.0
Turma 6 3.5999999999999996
Turma 6 0.1
Turma 6 3.8000000000000003
Turma 1 3.3
Turma 6 4.0
Turma 2 Piloto 1.6
Turma 4 8.5
Turma 3 Especial 0.9
Turma 6 2.5
Turma 1 3.5
Turma 4 4.1
Turma 4 0.8
Turma 6 2.2
Turma 2 Piloto 1.7000000000000002
Turma 5 2.4
Turma 6 3.6
Turma 6 3.0
Turma 5 0.8
Turma 1 2.2
Turma 2 Piloto 1.6
Turma 4 1.6
Turma 5 2.1
Turma 3 Especial 1.7000000000000002
Turma 6 8.2
Turma 5 2.6
Turma 6 3.4000000000000004
Turma 4 2.7
Turma 6 4.800000000000001
Turma 2 Piloto 3.0999999999999996
Turma 5 2.9
Turma 6 3.5
Turma 5 1.8
Turma 6 1.6
Turma 1 8.4
Turma 2 Piloto 4.4
Turma 1 1.9000000000000001
Turma 6 2.5
Turma 6 0.0
Turma 6 4.9
Turma 4 3.6
Turma 6 3.9
Turma 4 0.8
Turma 6 1.2000000000000002
Turma 6 3.0
Turma 6 6.3
Turma 2 Piloto 5.4
Turma 3 Especial 0.0
Turma 6 1.6
Turma 1 1.7
Turma 5 2.1
Turma 1 5.4
Turma 2 Piloto 1.6
I tried to call the function groupedhist
, from the package StatsPlots, grouped by the column :Class_Name, and normalized so that I could compare the grades of the students in the distinct classes, which have a different number of students. To that end, I thought the parameter normalize
would be appropriate (as suggested in the help for the function histogram
), since it would seem to ensure the total area for each group (in the corresponding bins) would sum to unity. After having read the csv file to a dataframe df_aux
, I then ran:
using StatsPlots, DataFramesMeta
group_hist = @with df_aux groupedhist(:Grade, group=:Class_Name,
title="Histograms", bins=11, xticks=0:1:10, normalize=:true)
To my surprise, the resulting output plot is given by
which sure is weird: visually we notice that, for instance, the area of the red bars (corresponding to “Turma 2 Piloto”) is manifestly less than the area of the lighter blue bars (corresponding to “Turma 6”)! Could it be that, for the
groupedhist
function the parameter normalize
is incorrectly implemented, if at all? It seems the height of the bars look like the counts in the bins, despite the numeric labels along the vertical axis, which suggest some normalization…Any help is gratefully appreciated!