StatsBase vs Statistics

I am wondering whether the naming of StatsBase and Statistics is entirely logical. StatsBase, despite its name, is the “more optional” package. Statistics is included in the base. StatsBase needs to be Pkg.added. maybe they are misnamed?!

StatsBase precedes the reorganization of Base code to Statistics. It is still the basic statistics package for things that go beyond the standard library.

2 Likes

StatsBase is most likely going to be moved in part to Statistics, and in part to other packages (like StatsModels). See e.g. move statistics out of Base by fredrikekre · Pull Request #27152 · JuliaLang/julia · GitHub.

7 Likes

if I understand correctly, julia will be evaporating StatsBase, so its misleading name will disappear. this is a good choice IMHO.

2 Likes

Almost 3 years later, I wonder if there is any updates on the future of StatsBase and Statistics. It was very confusing to me when I first started using Julia last year: for example, if I only imported StatsBase, I need to do StatsBase.std([1,2,3]) to find the standard deviation, and the reason is of course to avoid conflict with Statistics. I imagine it will also be a point of confusion for many other newcomers.

8 Likes

Not if they come from Python :rofl:

For people from python probably there is not even a need to ever load ‘Statistics’ as it appears that all the functionalities in ‘Statistics’ can be found in ‘StatsBase’ or ‘Distributions’.

‘Statistics’ is too small for their taste.:rofl:

This is not strictly correct. StatsBase.std is Statistics.std. The reason is that StatsBase imports Statistics (but does not re-export it), so std is in the namespace of StatsBase .

julia> using StatsBase

julia> @which StatsBase.std
Statistics

So there is no need to distinguish between StatsBase.std and Statistics.std. They are the same function.

There is no deep reason why StatsBase still exists in it’s current form and we haven’t put more things into Statistics. It’s simply a very tedious process that requires a lot of time and effort.

4 Likes

I think one of the reasons is the implicit cost of a commitment to maintenance and a specific API that comes with standard library version being tied to Julia. See this summary here.

While arguably some functionality could be moved from StatsBase to Statistics, it is unlikely that the API for the whole package is that finalized and/or belongs in a standard library. OTOH, there is no pressing need to move anything, after all there is a package and people can just use it.

So there is no need to distinguish between StatsBase.std and Statistics.std. They are the same function.

There is no deep reason why StatsBase still exists in it’s current form and we haven’t put more things into Statistics. It’s simply a very tedious process that requires a lot of time and effort.

Looks like this isn’t the case with Julia v1.5.3 (see the output). I am new to Julia and got stuck here wondering why median isn’t there in StatsBase.

julia> using StatsBase

julia> mean
mean (generic function with 8 methods)

julia> median
ERROR: UndefVarError: median not defined

julia> using Statistics

julia> median
median (generic function with 4 methods)

You’re probably using an old version of StatsBase. Use ]st StatsBase to check that you are using 0.33.16.

Yes! I am running v0.32.2.