ANN: StatsKit, new meta-package for statistics

announcement
statistics

#1

We hadn’t made an announcement until now, but a new meta-package called StatsKit has been registered some time ago. This is just a convenience package which automatically installs and loads all the most common packages needed to work with statistics.

using StatsKit currently loads the Statistics standard library module, and the following packages:

This package is intended for users of statistics packages. Packages themselves should continue to list individual packages in they dependencies rather than StatsKit as a whole.


#2

Nice! Thank you.

Just one question that came to my mind after reading the package list: Is there any specific reason, why there’s CSV.jl included instead of CSVFiles.jl?


#3

Can the name be changed to just Statistics? Might be easier for people just starting Julia.


#4

This name is already taken by a module in stdlib.


#5

Great work on StatsKit. Actually it would be very nice to have a blog for newcomers describing how to combine it with PackageCompiler to build them in into a system image - hopefully I will have some time in the future to write it up unless someone else does :smile: (@sdanisch made some great efforts with this recently). This would be a real killer, as it would start instantly - all past-R users would love it.


#6

Since this package is so small, is it possible to just have this code in stdlib?


#7

I am not sure if you are joking — code used by this package rivals Base in complexity, or all of the standard libraries put together.


#8

:slight_smile: The timing of loading of StatsKit.jl in a clean session is:

julia> @time using StatsKit
 10.360681 seconds (22.37 M allocations: 1.157 GiB, 4.46% gc time)

vs. eg. StatsPlots.jl (which is known to be slow to load) on a clean session:

julia> @time using StatsPlots
  7.833295 seconds (16.74 M allocations: 910.977 MiB, 5.22% gc time)

#9

Yes, that’s because the design of CSV.jl is more powerful for the long term since it supports loading data on the fly via the Tables.jl streaming API. On the contrary, CSVFiles/TextParse loads the full dataset as vectors (even if it supports streaming from that).


#10

I don’t see a unified documentation or examples of using these all together.


#11

There’s no common documentation at this point. Unfortunately I’m not sure it’s possible to automatically merge all Documenter manuals from separate packages into a single reference. Of course we can still write general tutorials, but they won’t be specific to StatsKit.