ANN: StatsKit, new meta-package for statistics

nalimilan · February 11, 2019, 2:34pm

We hadn’t made an announcement until now, but a new meta-package called StatsKit has been registered some time ago. This is just a convenience package which automatically installs and loads all the most common packages needed to work with statistics.

using StatsKit currently loads the Statistics standard library module, and the following packages:

This package is intended for users of statistics packages. Packages themselves should continue to list individual packages in they dependencies rather than StatsKit as a whole.

juius · February 11, 2019, 5:40pm

Nice! Thank you.

Just one question that came to my mind after reading the package list: Is there any specific reason, why there’s CSV.jl included instead of CSVFiles.jl?

affans · February 11, 2019, 6:09pm

Can the name be changed to just Statistics? Might be easier for people just starting Julia.

bkamins · February 11, 2019, 6:11pm

This name is already taken by a module in stdlib.

bkamins · February 11, 2019, 6:14pm

Great work on StatsKit. Actually it would be very nice to have a blog for newcomers describing how to combine it with PackageCompiler to build them in into a system image - hopefully I will have some time in the future to write it up unless someone else does (@sdanisch made some great efforts with this recently). This would be a real killer, as it would start instantly - all past-R users would love it.

affans · February 11, 2019, 6:26pm

Since this package is so small, is it possible to just have this code in stdlib?

Tamas_Papp · February 11, 2019, 6:32pm

I am not sure if you are joking — code used by this package rivals Base in complexity, or all of the standard libraries put together.

bkamins · February 11, 2019, 7:50pm

The timing of loading of StatsKit.jl in a clean session is:

julia> @time using StatsKit
 10.360681 seconds (22.37 M allocations: 1.157 GiB, 4.46% gc time)

vs. eg. StatsPlots.jl (which is known to be slow to load) on a clean session:

julia> @time using StatsPlots
  7.833295 seconds (16.74 M allocations: 910.977 MiB, 5.22% gc time)

nalimilan · February 11, 2019, 9:03pm

Yes, that’s because the design of CSV.jl is more powerful for the long term since it supports loading data on the fly via the Tables.jl streaming API. On the contrary, CSVFiles/TextParse loads the full dataset as vectors (even if it supports streaming from that).

ChrisRackauckas · February 12, 2019, 1:27pm

I don’t see a unified documentation or examples of using these all together.

nalimilan · February 12, 2019, 4:23pm

There’s no common documentation at this point. Unfortunately I’m not sure it’s possible to automatically merge all Documenter manuals from separate packages into a single reference. Of course we can still write general tutorials, but they won’t be specific to StatsKit.

Topic		Replies	Views
About StatsKit and Reexport General Usage question	1	168	July 20, 2022
Julia stats, data, ML: expanding usability Statistics statistics	84	5086	October 14, 2021
Request for un'stdlibfication of Statistics Internals & Design statistics , community	78	6384	September 10, 2022
StatsBase vs Statistics Statistics package	11	4834	February 25, 2022
JuliaUserGroupMunich: Fall-in-love-with-julia on JuliaStats Meetups announcement	4	559	April 7, 2022

ANN: StatsKit, new meta-package for statistics

Related topics