How to plot histogram subplots of a DataFrame?

Hello Everybody!

I have a question, i’m refatoring my old python projects to Julia and i have a doubt in a situation.

Looks how i plot using python matplotlib:

df.hist(figsize=(12,10));

And now using Julia StatsPlots

begin 	

	p1 = Plots.histogram(df.Age; 
		plot_title = "Age",plot_titlefontsize = 11)
	p2 = Plots.histogram(df.Balance; 
		plot_title = "Balance",plot_titlefontsize = 11)
	p3 = Plots.histogram(df.CreditScore; 
		plot_title = "CreditScore",plot_titlefontsize = 11)
	p4 = Plots.histogram(df.CustomerId; 
		plot_title = "CustomerId",plot_titlefontsize = 11)
	p5 = Plots.histogram(df.EstimatedSalary; 
		plot_title = "EstimatedSalary",plot_titlefontsize = 11)
	p6 = Plots.histogram(df.Exited; 
		plot_title = "Exited",plot_titlefontsize = 11)
	p7 = Plots.histogram(df.HasCrCard; 
		plot_title = "HasCrCard",plot_titlefontsize = 11)
	p8 = Plots.histogram(df.IsActiveMember; 
		plot_title = "IsActiveMember",plot_titlefontsize = 11)
	p9 = Plots.histogram(df.NumOfProducts; 
		plot_title = "NumOfProducts",plot_titlefontsize = 11)
	p10 = Plots.histogram(df.RowNumber; 
		plot_title = "RowNumber",plot_titlefontsize = 11)
	p11 = Plots.histogram(df.Tenure; 
		plot_title = "Tenure",plot_titlefontsize = 11)
	


	@df df Plots.plot(p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11;
		layout = 11,
		size=(1000,600)
	)
end

How can i improve the Julia code to get that same result without writing a lot of code?

Thx for Help!

and sorry, I’m Julia newbie

1 Like

Hey! Would using the legend as the column label work instead? That way we could just do something like this:

using DataFrames, StatsPlots

df = DataFrame(randn(100, 3), [:Tenure, :Balance, :CreditScore])

N = ncol(df)

@df df histogram(cols(1:N); layout=N)

index

2 Likes

Thx @icweaver !

I tried that but returned an error.

begin
	N = ncol(df)

	@df df Plots.histogram(cols(1:N); layout=N)
end
Cannot convert Matrix{Any} to series data for plotting

    error(::String)@error.jl:33
    _prepare_series_data(::Matrix{Any})@series.jl:8
    _series_data_vector(::Matrix{Any}, ::Dict{Symbol, Any})@series.jl:27
    macro expansion@series.jl:127[inlined]
    apply_recipe(::AbstractDict{Symbol, Any}, ::Type{RecipesPipeline.SliceIt}, ::Any, ::Any, ::Any)@RecipesBase.jl:289
    _process_userrecipes!(::Any, ::Any, ::Any)@user_recipe.jl:36
    recipe_pipeline!(::Any, ::Any, ::Any)@RecipesPipeline.jl:70
    _plot!(::Plots.Plot, ::Any, ::Any)@plot.jl:208
    #plot#135@plot.jl:91[inlined]
    var"#histogram#419"(::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::typeof(Plots.histogram), ::Any)@RecipesBase.jl:410
    var"#add_label#19"(::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:layout,), Tuple{Int64}}}, ::typeof(StatsPlots.add_label), ::Vector{Matrix{Symbol}}, ::Function, ::Matrix{Any})@df.jl:155
    (::var"#405#406"{Module, Colon, Int64})(::DataFrames.DataFrame)@none:0
    top-level scope@Local: 4[inlined]

What’s my mystake?

That is very odd. Could you share a MWE, including the df you are using?

Edit: whoops, linked to to the wrong example above. Should be updated now

1 Like

Here’s a df sample:

first(df,5)
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
Int64 Int64 String31 Int64 String7 String7 Int64 Int64 Float64 Int64 Int64 Int64 Float64 Int64
1 1 15634602 Hargrave 619 France Female 42 2 0.0 1 1 1 1,01E+10 1
2 2 15647311 Hill 608 Spain Female 41 1 83807.9 1 0 1 1,13E+10 0
3 3 15619304 Onio 502 France Female 42 8 1,60E+10 3 1 0 1,14E+10 1
4 4 15701354 Boni 699 France Female 39 1 0.0 2 0 0 93826.6 0
5 5 15737888 Mitchell 850 Spain Female 43 2 1,26E+10 1 1 1 79084.1 0

Thanks, but I just meant a copy-and-pasteable code snippet that I can run to try and reproduce your error please =]

The package versions that you are using would also be really helpful! To work from a clean environment you can do the following:

julia> ]
pkg> activate --temp
(jl_jTP8xX) pkg> add <list of packages used here>
julia> <MWE here>

Then you can view the list of packages and the versions that were used by just typing st (short for status) into the packge repl (shown by the prompt that started with pkg>):

julia> ]
(jl_jTP8xX) pkg> st
<list of packages used should display here>
1 Like

Code used

begin
	N = ncol(df)

	@df df Plots.histogram(cols(1:N); layout=N)
end

Error:

Cannot convert Matrix{Any} to series data for plotting

    error(::String)@error.jl:33
    _prepare_series_data(::Matrix{Any})@series.jl:8
    _series_data_vector(::Matrix{Any}, ::Dict{Symbol, Any})@series.jl:27
    macro expansion@series.jl:127[inlined]
    apply_recipe(::AbstractDict{Symbol, Any}, ::Type{RecipesPipeline.SliceIt}, ::Any, ::Any, ::Any)@RecipesBase.jl:289
    _process_userrecipes!(::Any, ::Any, ::Any)@user_recipe.jl:36
    recipe_pipeline!(::Any, ::Any, ::Any)@RecipesPipeline.jl:70
    _plot!(::Plots.Plot, ::Any, ::Any)@plot.jl:208
    #plot#135@plot.jl:91[inlined]
    var"#histogram#419"(::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::typeof(Plots.histogram), ::Any)@RecipesBase.jl:410
    var"#add_label#19"(::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:layout,), Tuple{Int64}}}, ::typeof(StatsPlots.add_label), ::Vector{Matrix{Symbol}}, ::Function, ::Matrix{Any})@df.jl:155
    (::var"#405#406"{Module, Colon, Int64})(::DataFrames.DataFrame)@none:0
    top-level scope@Local: 4[inlined]

Packages

(jl_tJabzV) pkg> st
      Status `C:\Users\Pedro Henrique\AppData\Local\Temp\jl_tJabzV\Project.toml`
  [336ed68f] CSV v0.10.3
  [a93c6f00] DataFrames v1.3.2
  [91a5bcdd] Plots v1.27.2
  [2913bbd2] StatsBase v0.33.16
  [f3b207a7] StatsPlots v0.14.33

I think i found the problem, and is in the non numerical columns.

begin
	N = ncol(df)

	@df df Plots.histogram(cols(7:N); layout=N)
end

image

Above i used only columns with numerical values to test and works!

Sorry about all the trouble, I’m not in the area of computer science

No worries at all, great find!

1 Like

For the original question, you could also feed the titles to the @df macro to obtain:

CODE
using DataFrames, StatsPlots

colnames = [:Age, :Balance, :CreditScore, :CustomerId, :EstimatedSalary, :Exited,
           :HasCrCard, :IsActiveMember, :NumOfProducts, :RowNumber, :tenure]
N = length(colnames)
df = DataFrame(randn(100, N), colnames)

@df df histogram(cols(1:N); layout=N, legend=false, title=permutedims(colnames),
    size=(1000,600), frame=:box, titlefontsize=11, c=:blues)
3 Likes

It is recommended to start a new topic when question is unrelated.

2 Likes

As @rafael.guerra mentioned, we would be more than happy to follow up with this in a new thread! It just helps others find the relevant bits of the discussion in the future =]

1 Like

Nice @rafael.guerra and @icweaver !

With your help i learned a lot and created a very simple function who solves my question.

function plot_df(df::AbstractDataFrame,graph_type::Symbol)
	numerical_cols = Symbol.(names(df,Real))

	@df df Plots.plot(cols(numerical_cols); layout = length(numerical_cols), seriestype = graph_type,
	title=permutedims(numerical_cols),
	size=(1200,600),
	frame=:box)
end
plot_df(df,:histogram)

I really appreciate your collaboration, and i will make a new topic for the countplot!

Thx!

1 Like