Pingouin.jl: a simple yet exhaustive statistical package

Hi everyone,

I always wanted to learn Julia, and on the other hand, I never found a satisfying library to conduct statistical tests. I used Pingouin, a stats library in Python made by Raphael Vallat, and I always wished to have an equivalent package in Julia. So here is my version, completely coded in Julia, (pre-release with only a limited set of features).

As of now, Pingouin.jl 0.1.0 (https://github.com/clementpoiret/Pingouin.jl) supports distribution-related functions such as:

  • Anderson-Darling test of distribution,
  • Geometric standard (Z) score,
  • Levene & Bartlett tests for homoscedasticity,
  • Shapiro-Wilk, Shapiro-Francia and Jarque Bera tests of normality,
  • Mauchly and JNS tests for sphericity,
  • Epsilon adjustement factor for repeated measures (e.g. i.e. Greenhouse-Geisser, Huynh-Feldt, Lower bound).

It also supports effect sizes-related functions:

Effect sizes between two Arrays:

  • Unbiased Cohen d,
  • Hedges g,
  • Glass delta,
  • correlation coefficient (pearson),
  • Eta-square,
  • Odds ratio,
  • Area Under the Curve,
  • Common Language Effect Size.

The conversion of pearson’s r and cohen’s d to:

  • Unbiased Cohen d,
  • Hedges g,
  • Eta-square,
  • Odds ratio,
  • Area Under the Curve.

But also the computation of effect sizes from T-values, parametric confidence intervals around a Cohen d or a correlation coefficient, and bootstrapped confidence intervals of univariate and bivariate functions.

The main goal is to provide a really a simple API, for simple and advanced statistics. The 0.1.0 will soon be published to the default julia package registry.

It is my first real project in Julia, so I really hope you’ll like it. I’m a newbie, so feel free to give any suggestions, contributions. Feel free to make any remarks, or whatever you want, I want to improve my Julia skills :slight_smile:

The next release will include paired and unpaired non-parametric tests such as Mann-Whitney U, Wilcoxon Signed Rank, or Friedman.

26 Likes

Have you seen HypothesisTests.jl? I haven’t looked at your package in detail so I might be speaking out of turn here, but it seems to me that your work is closely related and might be worth contributing to the established package to prevent fragmentation?

11 Likes

@nilshg raises a great point.
You’ve already filled a few gaps in Goodness-of-fit tests for the Julia ecosystem.

It would be convenient if these tests were all in one place.

Suppose a Julia user wants to test a hypothesis H_0. There are often many different tests, each w/ different properties (some have better power, others better size etc). Having more tests together allows for easier discovery, maintainability, and comparisons across tests.

For example, if I wanna test whether my sample is normally distributed in Mathematica, it automatically returns all relevant goodness-of-fit tests (along w/ stats & pvalues):
image

5 Likes

I support that it would be better to fill a PR to HypothesisTesting package if possible. I believe the work will have a bigger impact.

1 Like

Thank you for your comments! I agree with all of you guys, especially for the Shapiro-Wilk test which is fairly common but yet to be implemented in Julia (except here ahah). I’ll do some PR, but the end goal of Pingouin (as you can see is the readme or in the original Python package), is not only hypothesis tests. E.g., it’ll include some plotting methods like QQ-Plots, or even estimation statistics (which are not really testing hypothesis): https://www.estimationstats.com#/background and I don’t think it’s the goal of HypothesisTests.jl; maybe out of scope?

As of now I started with some hypothesis tests because it’s what I use the most, but it could be a wrapper around HypothesisTests.jl (I already use it for example for Jarque-Bera) :slight_smile:

I’ll work on the PR when I’ll have some time

10 Likes

For plotting, it might be worth adding to Plots.jl or StatsPlots.jl.

The overall package idea seems good to me. I agree that it would be better to add the hypothesis tests to the existing HypothesisTests.jl and use it internally in your package. Then you can add the additional features that you have planned and still contribute to the common ecosystem infrastructure.

6 Likes

QQ-Plots are implemented in both StatsPlots and AbstractPlotting, but if you have other statistical visualizations that are not covered by those packages, a PR to integrate them would definitely be welcome!

5 Likes

Thanks all of you for your kind advices, I’ll be happy to submit PRs then use them in my package :slight_smile:

Thank you, @Clement_POIRET, it is a nice package. I see that all functions are nicely documented, even with examples. I suggest you to use Documenter.jl or similar to document the package, it is very simple, it can be done in minutes, and could be useful for users.

1 Like

I’ll take a look at Documenter.jl, thanks for the tip @dmolina :slight_smile: