When should a package be added to JuliaStats


#1

Hi all,

I was wondering what the procedure is for deciding if a package should be added to JuliaStats, and, if so, how it is done.

I’ve got two packages that are possibly suitable, DependentBootstrap, and ForecastEval.

Of the two, DependentBootstrap is probably of greater interest, since bootstrapping time-series is a fairly common operation, and that package is, AFAIK, more complete that just about anything else available in any language (for example, it includes some of the latest techniques in automatic block-length selection, as well as support for multivariate datasets etc). It also is typically faster than any of the comparable R packages (including that one that is implemented in Fortran - the name currently escapes me).

ForecastEval contains code for a Diebold-Mariano test, which is fairly well-known, but the other routines (Reality Check, SPA test, and Model Confidence Set) are a little more niche (although very interesting to someone like me!). EDIT: just noticed the build-status on ForecastEval is failing. I’m not very fluent at Github/Travis, so I’ve obviously stuffed something up there - on the Travis page it looks like the last build passed to me. Weird. The package itself works just fine on v0.6 on my machine with all tests passing.

Cheers,

Colin


#2

Moving a package to an organization is done mostly for two reasons: (1) development occurs with multiple people, (2) it would be a considerable burden for a single maintainer to keep the project up to date. If any of the two is true it may be good candidate to be moved to an organization. Registered packages can be moved through Github tool for transferring ownership of the Repo and updating the repo address for Metadata. It would be best to check with the organization members and see how it fits within the ecosystem. For example, would DependentBootstrap be best called from JuliaStats/TimeSeries? Would it make sense to make an organization for TimeSeries tools and move those related packages outside JuliaStats?

Reach to the authors of related packages and maintainers of the organization. Try to get your code to match the organization development guidelines. From a quick look at DependentBootstrap, it would be good to host the documentation using Documenter and Github Pages. Another good practice is to report the code overage of your tests. See for example NCEI.


#3

Ah. I think I have misunderstood the purpose of JuliaStats. I thought it was designed to be an umbrella for mostly complete statistical packages that makes it easy for new users to browse the available options.

Neither of your reasons for moving apply to my packages (development is/was done entirely by me, and at this point they are mostly complete with very few dependencies, so the maintenance burden is low) so perhaps it is best to keep them separate for now. I’ll have another look in a years time, and if a general framework for time-series is emerging, I’ll see if they should be incorporated then.

My understanding is that most of the work at the moment is focused on more core issues, like dataframes etc.

Cheers, and thanks for responding.

Colin

ps thanks also for the tips regarding the packages. Getting Documenter and a measure of code coverage have both been on my to-do list for a while now :slight_smile:


#4

As for the issue of discoverability, Pkg3 the soon-to-be package manager for Julia v1.y.z will allow the use of keywords and descriptions in the metadata of packages to make them easily discoverable.


#5

Yes, currently organizations are mostly used to allow cooperation. As long as there’s a single developer, or a single maintainer with people doing smaller contributions, it’s not really useful to move to JuliaStats. OTOH it would make sense if the need to coordinate multiple core developers arises at some point.

We need a different solution to highlight packages. We already have Julia Observer, but maybe something more organized like CRAN tasks views for R would be useful.


#6

Congratulations Colin on a very cool-looking package!

What is a dependent bootstrap? :slight_smile:

Does it have special relevance for financial time series, eg spot price or volatility processes?


#7

Yes, this makes sense, now that is has been pointed out to me.

I definitely agree that some method other than “whatever google deems most suitable” would be useful for new users attempting to discover julia packages.

Cheers,

Colin


#8

If you know what a statistical bootstrap is, eg the IID bootstrap of Efron (1979), then a dependent bootstrap is just a generalization of that concept to time-series. That is, dependent bootstraps relax the independent and identically distributed assumption to something allowing some form of dependence between observations, such as serial correlation.

The underlying purpose of the dependent bootstrap and iid bootstrap is exactly the same, ie to non-parametrically say something about the parameters of the distribution of some estimator.

Yes, it gets used a lot in Finance, especially for asset volatility processes, which arguably are weakly dependent. You probably wouldn’t use one of the traditional dependent bootstraps for prices, since these most definitely exhibit dependence at infinite time-lags (eg the random walk model), but they definitely get used for differences in prices, or returns.

Cheers,

Colin


#9

In addition to @colintbowers’s explanation, if you want to know more about the bootstrap from a modern perspective I would recommend

@book{efron2016computer,
  title={Computer age statistical inference},
  author={Efron, Bradley and Hastie, Trevor},
  year={2016},
  publisher={Cambridge University Press}
}

#11

A dangerous topic to get me started on :slight_smile:

Needless to say, it sounds like we see eye-to-eye on this.


#12

Good point, I deleted my post. Meanwhile I have patented mean() and am forming a business plan :wink: