Vectorized statistics.jl

djholiver · June 29, 2022, 8:10am

Hi,

I came across VectorizedStatistics.jl but cant see it mentioned in package announcements (although it’s available in the general registry). It appears to be an absolutely incredible leap in performance (for some use cases) for percentiles so I am eager to make use of it.

Sorry to @ but is it ready for general usage @brenhinkeller ?

Further (apologies in advance for lack of MWE as I’m on my phone), with a few examples run I saw the “classic” degradation (10x+ in some cases) of the QuickSelector algorithm (which is applied for percentiles) performance when there were many duplicates in the target vector. Is there any plan on how to mitigate or have you observed this yourself?

Regards,

Storopoli · June 29, 2022, 10:30pm

Been using it and teaching with it in a Scientific Computing course.
Congratulations on the package!

brenhinkeller · July 3, 2022, 7:35pm

Hi @djholiver, good question. I wasn’t very active on discourse when we registered it so never made an announcement, but I hope VectorizedStatistics.jl (and the nan-ignoring equivalents in NaNStatistics.jl) are generally usable, albeit with a particular set of tradeoffs – compilation time on first use may be significant, it has more dependencies than some other ways of implementing these statistics, and both are relatively new so we may not have found all bugs yet, etc… If I get tenure they’ll at least be maintained for a good long while though :).

As you noted, the sorting implementation that underlies vmedian/vpercentile/vquantile is relatively naive and while fast for some cases may not be for many others. More generally actually, a major improvement would be to have a more explicitly SIMD’d sorting algorithm altogether (a relatively major undertaking which I haven’t had time for) – PRs would be very welcome on this front for anyone interested.

brenhinkeller · July 3, 2022, 7:36pm

Oh awesome, thanks!

Topic		Replies	Views
[ANN] GeoStats.jl v0.10 Package Announcements package , announcement , statistics	7	925	October 30, 2019
Parallelism in NestedSamplers.jl `Proposals` Statistics question	0	37	March 13, 2025
Announcement: GeoStats.jl Statistics package , announcement , statistics , geostatistics	7	2391	July 19, 2017
[ANN] Breakers.jl Package Announcements package	10	677	April 10, 2025
Transfer ClusteringAPI.jl to JuliaStats Statistics package	7	327	June 18, 2024

Vectorized statistics.jl

Related topics