Another post on package compilation time

@Tamas_Papp Turns out this is not all that costly at runtime. I think I’ve found a good compromise: when doing model selection (and only then), I now estimate each model as a subset-GARCH model nested within a larger model determined by a maxlags parameter. This effectively “lowers” the type parameters into the value domain temporarily.

Taking a GARCH(1, 1) nested in a TGARCH{5, 5, 5} as an example, this increases the runtime from around 3 milliseconds to 12 milliseconds on a dataset with ~2k observations. It does, however, avoid the ~500 milliseconds of compilation time, so this is a good tradeoff when estimating many different models, as occurs in model selection. In a fresh session, we now have

julia> @time selectmodel(TGARCH, BG96; minlags=0, maxlags=3);
  9.977057 seconds (35.13 M allocations: 1.707 GiB, 22.18% gc time)

julia> @time selectmodel(TGARCH, BG96; minlags=0, maxlags=3);
  0.110632 seconds (876.17 k allocations: 108.719 MiB)

compared to

julia> @time selectmodel(TGARCH, BG96; minlags=0, maxlags=3);
 91.165097 seconds (394.59 M allocations: 17.032 GiB, 4.62% gc time)

julia> @time selectmodel(TGARCH, BG96; minlags=0, maxlags=3);
  0.062485 seconds (866.18 k allocations: 82.316 MiB)

before. For estimation that does not happen in a model selection context, I still specialize on the type, because one might want to do this thousands of times (e.g., when backtesting or fitting a large-dimensional DCC model that uses univariate models in the marginals), so it makes sense to squeeze out every bit of performance then.

@Albert_Zevelev Thanks a lot for making me re-evaluate this. This really improves the user experience quite substantially. If you want to give it a spin, it’s part of version 1.2.0 which will be available as soon as https://github.com/JuliaRegistries/General/pull/17176 gets merged.

9 Likes