When in python pandas I find pd.DataFrame.describe a helpful tool to get some quick insight about my DataFrame, particularly when using large DataFrames in the REPL. It works as follows, from the linked docs:
>>> s = pd.Series([1, 2, 3])
>>> s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25% 1.5
50% 2.0
75% 2.5
max 3.0
dtype: float64
When I used Python I mainly used pandas, but for my Julia work I tend to use Vectors and Matrices. Itβs good to know that DataFrames.jl has that functionality, and itβs clearly a valid answer. Interestingly this method does accept vectors, but not matrices
It seems that StatsBase.jl is implementing the describe and tries to work out quantiles over the entire matrix which it doesnβt have a method for. One solution is:
But it would probably be best for describe to have a dims argument so it can work row/column wise. Unless you know of this functionality elsewhere I might open a ticket with StatsBase
This does give the correct information but the output is quite ugly as itβs 10x Vector description. Itβd be better if the description could maintain the matrices columns, as it would with pandas.DataFrame.describe and the simple describe(ones(10,10)) syntax.
True, but I intend on doing this a lot, so Iβd rather not have to keep creating DataFrames and saving the keystrokes would be preferred. As you suggest, to implement it would be rather trivial, if I get time I think Iβll make a PR for this of StatsBase. Thanks for your help