[ANN] CircStats.jl, Julia version of MATLAB CircStat toolbox

Hi,

The circular statistics are important for many areas, and for neuroscience, the MATLAB CircStat toolbox is probably the most well-known and used one. Since i needed it for my own research, I’ve been gradually rewrite and translate them to Julia. With the kind permission of the original author, I’ve translated the whole toolbox into CircStats.jl and it’s now hosted under the circstat organization which have the same toolbox for MATLAB, Python and R.

It will be registered in a few days, and i’ll setup the CI and doc later, hope others found it useful too.

Cheers,
Alex

12 Likes

There is this repository with the same name but not registered. Would you have any comments on the main differences?

Also note a package of mine: DirectionalStatistics.jl, registered about a year ago, I believe. It includes circular statistic functions, with a neat feature that all of them can operate on arbitrary ranges: Circular.mean(array, 0..π) for a half-circle periodicity, Circular.mean(array, -180..+180) for degrees, etc.

2 Likes

I wasn’t aware of the package you mentioned. With a quick peak of the repository, it seems have less functionality compare to the MATLAB CircStat toolbox. The main reason i tried to translate the MATLAB CircStat toolbox, is it’s been used in a number of publication, and it seems to be the first choice for, as far as i know, neuroscience research, so it would be nice to use the same toolbox for consistency.

1 Like

I opened several issues for Distributions and HypothesisTests, and from my own perspective, i would love to see directional statistics be incorporated into these fundamental packages. But maybe a separate package like DirectionalStatistics.jl would be a better place to start.

As in my previous reply, the reason for current CircStats.jl is for rich features and matureness, so i keep the changes as minimal as possible, that it feels like just a translated version of the MALLAB toolbox.

I hope the story ends at Distributions and HypothesisTests, or a separate DirectionalStatistics.

3 Likes

Indeed, circular distributions would be best in Distributions (I think some already are?), and circular tests in HypothesisTests. It’s less obvious where descriptive statistics like mean, std, median, should end up. Not sure if StatsBase considers circular statistic potentially in scope. My DirectionalStatistics.jl package specifically focuses on descriptive statistics.

Of course, there is nothing wrong with having a package like yours that copies the interface/implementation from another language. As you say, it can definitely be useful for those coming from the matlab background.

1 Like

I totally agree your point, and it will be hard to extend the mean, std in the StatsBase without changing function signature or semantics, so it leads to either use other names, such as circ_mean/circmean/cmean etc. or put all descriptive statistics in another package or submodule.

It is possible to extend regular functions like mean to work on circular data based on types. In a sense, it’s more natural: circular structure is first and foremost a property of data, so a reasonable approach is to encode this in a type.

Like, mean(circular(X)) instead of Circular.mean(X). A practical advantage is that one creates a circular dataset X = circular(...), and then applies regular statistical functions mean(X), std(X) without explicitly specifying the circularity each time.
Another possibility is to make values themselves aware of the circular structure. As in, a = circular(1), b = circular(2), ...: it becomes impossible to mess up and directly apply non-circular statistics to them. All of distance(a, b), mean([a, b]) and so on would work automatically.

The first mean(circular(X)) approach is similar to what was discussed in StatsBase regarding the future of weighted statistics: mean(weighted(X, ws)) vs current mean(X, weights(ws)). Not sure what happened to that discussion though.

2 Likes

I like the idea of a circular type, just to be a wrapper of a Number of AbstractArray, so that circular(X;deg=false) could handle radius&degree easily, but it may require all circular related Distribution and HypothesisTests to be built on this type. Nevertheless, This just solve the syntax consistency issues, the underlining functionality would still based on Number and AbstractArray, it’s debatable whether to export these underling functions that could work directly on circular data.