When should a package be added to JuliaStats

colintbowers · April 17, 2018, 2:03am

Hi all,

I was wondering what the procedure is for deciding if a package should be added to JuliaStats, and, if so, how it is done.

I’ve got two packages that are possibly suitable, DependentBootstrap, and ForecastEval.

Of the two, DependentBootstrap is probably of greater interest, since bootstrapping time-series is a fairly common operation, and that package is, AFAIK, more complete that just about anything else available in any language (for example, it includes some of the latest techniques in automatic block-length selection, as well as support for multivariate datasets etc). It also is typically faster than any of the comparable R packages (including that one that is implemented in Fortran - the name currently escapes me).

ForecastEval contains code for a Diebold-Mariano test, which is fairly well-known, but the other routines (Reality Check, SPA test, and Model Confidence Set) are a little more niche (although very interesting to someone like me!). EDIT: just noticed the build-status on ForecastEval is failing. I’m not very fluent at Github/Travis, so I’ve obviously stuffed something up there - on the Travis page it looks like the last build passed to me. Weird. The package itself works just fine on v0.6 on my machine with all tests passing.

Cheers,

Colin

Nosferican · April 17, 2018, 9:26pm

Moving a package to an organization is done mostly for two reasons: (1) development occurs with multiple people, (2) it would be a considerable burden for a single maintainer to keep the project up to date. If any of the two is true it may be good candidate to be moved to an organization. Registered packages can be moved through Github tool for transferring ownership of the Repo and updating the repo address for Metadata. It would be best to check with the organization members and see how it fits within the ecosystem. For example, would DependentBootstrap be best called from JuliaStats/TimeSeries? Would it make sense to make an organization for TimeSeries tools and move those related packages outside JuliaStats?

Reach to the authors of related packages and maintainers of the organization. Try to get your code to match the organization development guidelines. From a quick look at DependentBootstrap, it would be good to host the documentation using Documenter and Github Pages. Another good practice is to report the code overage of your tests. See for example NCEI.

colintbowers · April 17, 2018, 11:09pm

Ah. I think I have misunderstood the purpose of JuliaStats. I thought it was designed to be an umbrella for mostly complete statistical packages that makes it easy for new users to browse the available options.

Neither of your reasons for moving apply to my packages (development is/was done entirely by me, and at this point they are mostly complete with very few dependencies, so the maintenance burden is low) so perhaps it is best to keep them separate for now. I’ll have another look in a years time, and if a general framework for time-series is emerging, I’ll see if they should be incorporated then.

My understanding is that most of the work at the moment is focused on more core issues, like dataframes etc.

Cheers, and thanks for responding.

Colin

ps thanks also for the tips regarding the packages. Getting Documenter and a measure of code coverage have both been on my to-do list for a while now

Nosferican · April 17, 2018, 11:20pm

As for the issue of discoverability, Pkg3 the soon-to-be package manager for Julia v1.y.z will allow the use of keywords and descriptions in the metadata of packages to make them easily discoverable.

nalimilan · April 18, 2018, 9:46am

Yes, currently organizations are mostly used to allow cooperation. As long as there’s a single developer, or a single maintainer with people doing smaller contributions, it’s not really useful to move to JuliaStats. OTOH it would make sense if the need to coordinate multiple core developers arises at some point.

We need a different solution to highlight packages. We already have Julia Observer, but maybe something more organized like CRAN tasks views for R would be useful.

felix · April 18, 2018, 9:57am

Congratulations Colin on a very cool-looking package!

What is a dependent bootstrap?

Does it have special relevance for financial time series, eg spot price or volatility processes?

colintbowers · April 18, 2018, 10:13am

Yes, this makes sense, now that is has been pointed out to me.

I definitely agree that some method other than “whatever google deems most suitable” would be useful for new users attempting to discover julia packages.

Cheers,

Colin

colintbowers · April 18, 2018, 10:21am

If you know what a statistical bootstrap is, eg the IID bootstrap of Efron (1979), then a dependent bootstrap is just a generalization of that concept to time-series. That is, dependent bootstraps relax the independent and identically distributed assumption to something allowing some form of dependence between observations, such as serial correlation.

The underlying purpose of the dependent bootstrap and iid bootstrap is exactly the same, ie to non-parametrically say something about the parameters of the distribution of some estimator.

Yes, it gets used a lot in Finance, especially for asset volatility processes, which arguably are weakly dependent. You probably wouldn’t use one of the traditional dependent bootstraps for prices, since these most definitely exhibit dependence at infinite time-lags (eg the random walk model), but they definitely get used for differences in prices, or returns.

Cheers,

Colin

Tamas_Papp · April 18, 2018, 11:11am

In addition to @colintbowers’s explanation, if you want to know more about the bootstrap from a modern perspective I would recommend

@book{efron2016computer,
  title={Computer age statistical inference},
  author={Efron, Bradley and Hastie, Trevor},
  year={2016},
  publisher={Cambridge University Press}
}

colintbowers · April 19, 2018, 4:06am

A dangerous topic to get me started on

Needless to say, it sounds like we see eye-to-eye on this.

pasha · April 19, 2018, 11:53am

Good point, I deleted my post. Meanwhile I have patented mean() and am forming a business plan

Topic		Replies	Views
Transfer ClusteringAPI.jl to JuliaStats Statistics package	7	327	June 18, 2024
How can we create a leaner ecosystem for Julia? Statistics package , proposal , time-series , machine-learning	101	10123	October 15, 2020
Pushing Julia/statistics development Statistics	14	6122	August 8, 2022
Julia stats, data, ML: expanding usability Statistics statistics	84	5079	October 14, 2021
ANN: StatsKit, new meta-package for statistics Package Announcements announcement , statistics	10	1976	February 12, 2019

When should a package be added to JuliaStats

Related topics