What is the better way to make a countplot?

Hi everybody!

I’m trying to make a countplot with StatsPlots, and i’m using the code below, that works, but i want to know if have a simplest or clever way to achieve this.

begin
@df df Plots.bar(unique(df.Gender),
	[count(i->i==("Female"), df.Gender),count(i->i==("Male"), df.Gender)])
end

image

The problem of my code is if have a lot of different categories like the column “Geography” i need to specify manually each category value.

Here’s a sample of my dataframe:

RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
Int64 Int64 String31 Int64 String7 String7 Int64 Int64 Float64 Int64 Int64 Int64 Float64 Int64
1 1 15634602 Hargrave 619 France Female 42 2 0.0 1 1 1 1,01E+10 1
2 2 15647311 Hill 608 Spain Female 41 1 83807.9 1 0 1 1,13E+10 0
3 3 15619304 Onio 502 France Female 42 8 1,60E+10 3 1 0 1,14E+10 1
4 4 15701354 Boni 699 France Female 39 1 0.0 2 0 0 93826.6 0
5 5 15737888 Mitchell 850 Spain Female 43 2 1,26E+10 1 1 1 79084.1 0

Thx!

Oh yea, this is something that I always scratch my head on. I would be careful about doing things this way because we have to pay special attention to what order the groups should come in. On my end at least, unique(df.Gender) returns [Male, Female], while what you have above plots the bars in the order of [Female, Male], so the bar heights get swapped

We can clean this up a bit by replacing the anonymous functions with ==() directly, and using :column1 instead of df.column1 because the @df macro is able to understand the former:

@df df_wide Plots.bar(
	unique(:Gender),
	[count(==("Male"), :Gender), count(==("Female"), :Gender)]
)

There is also groupedbar which handles grouping the bars into categories, as I think you mentioned in your other post, but it requires a fair bit of massaging the data into the right form. At this point I would just switch over to AlgebraOfGraphics.jl

If you haven’t checked it out already, it’s a really handy package for making statistical plots with loads of customizability. Here’s a quick grouped barplot example:

using AlgebraOfGraphics, CairoMakie, DataFrames
# replace CairoMakie with GLMakie if not plotting from a notebook

# "wide" formatted data
df_wide = let
	N = 100
	DataFrame(
		Geography = rand(("France", "Spain", "US"), N),
		Score = rand(1:100, N),
		Group = rand('A':'D', N)
	)
end

df_long = stack(df_wide, :Score)
p = data(df_long) *
	mapping(:Geography=>"xlabel", :value=>"ylabel";
		color = :Group => "Legend title",
		dodge = :Group,
	) *
	visual(BarPlot)

draw(p)

1 Like
using Plots
using StatsBase

bar(countmap(df.Gender))
7 Likes

Nice, I didn’t know bar could accept dicts. This is definitely the way to go for simple count plots, haha

1 Like

Nice @yha and @icweaver !

Thx for that!