Custom XGBoost Loss function w/ Zygote. Julia Computing blog post

I probably wasn’t being clear about my objective.

Consider a Lasso model, w/ one hyper-parameter λ.
Suppose I have a grid of possible values for λ: G=[0.0, 0.1, …, 1.0]
(Sometimes sklearn includes a default grid, sometimes I make my own grid, sometimes I use another tuning method…)

My objective: find the model w/ the best out-of-sample (OOS) predictive power (where I define “best” as minimum RMSE),
Q: how do I select the optimal hyper-parameter?

  1. Partition the rows into train & test samples.
  2. For each λ in G, compute the average CV RMSE within X[train,:] using K-fold CV or some other resampling technique.
    Note: this means splitting X[train,:] into K-folds.
    For each λ I will have a score (CV RMSE).
  3. Select the optimal λ according to some method.
    Many select the λ w/ the lowest CV RMSE.
    Tibshirani et al recommend the λ corresponding to the most parsimonious model (fewest nonzero predictors) w/ CV RMSE 1 standard deviation above the minimum.
    A good ML interface allows the user to specify a technique for selecting the optimal λ.
  4. Using the optimal λ, predict y w/ X[test,:] & compute OOS RMSE(y[test], pred).

It might seem a bit redundant.
CV-RMSE is to avoid overfitting during model training.
The 1sd method (for example) is to avoid overfitting during model selection.