ANN: StatsKit, new meta-package for statistics

We hadn’t made an announcement until now, but a new meta-package called StatsKit has been registered some time ago. This is just a convenience package which automatically installs and loads all the most common packages needed to work with statistics.

using StatsKit currently loads the Statistics standard library module, and the following packages:

This package is intended for users of statistics packages. Packages themselves should continue to list individual packages in they dependencies rather than StatsKit as a whole.

22 Likes

Nice! Thank you.

Just one question that came to my mind after reading the package list: Is there any specific reason, why there’s CSV.jl included instead of CSVFiles.jl?

2 Likes

Can the name be changed to just Statistics? Might be easier for people just starting Julia.

This name is already taken by a module in stdlib.

Great work on StatsKit. Actually it would be very nice to have a blog for newcomers describing how to combine it with PackageCompiler to build them in into a system image - hopefully I will have some time in the future to write it up unless someone else does :smile: (@sdanisch made some great efforts with this recently). This would be a real killer, as it would start instantly - all past-R users would love it.

1 Like

Since this package is so small, is it possible to just have this code in stdlib?

I am not sure if you are joking — code used by this package rivals Base in complexity, or all of the standard libraries put together.

5 Likes

:slight_smile: The timing of loading of StatsKit.jl in a clean session is:

julia> @time using StatsKit
 10.360681 seconds (22.37 M allocations: 1.157 GiB, 4.46% gc time)

vs. eg. StatsPlots.jl (which is known to be slow to load) on a clean session:

julia> @time using StatsPlots
  7.833295 seconds (16.74 M allocations: 910.977 MiB, 5.22% gc time)
1 Like

Yes, that’s because the design of CSV.jl is more powerful for the long term since it supports loading data on the fly via the Tables.jl streaming API. On the contrary, CSVFiles/TextParse loads the full dataset as vectors (even if it supports streaming from that).

2 Likes

I don’t see a unified documentation or examples of using these all together.

2 Likes

There’s no common documentation at this point. Unfortunately I’m not sure it’s possible to automatically merge all Documenter manuals from separate packages into a single reference. Of course we can still write general tutorials, but they won’t be specific to StatsKit.