I probably wasn’t being clear about my objective.
Consider a Lasso model, w/ one hyper-parameter λ.
Suppose I have a grid of possible values for λ: G=[0.0, 0.1, …, 1.0]
(Sometimes sklearn includes a default grid, sometimes I make my own grid, sometimes I use another tuning method…)
My objective: find the model w/ the best out-of-sample (OOS) predictive power (where I define “best” as minimum RMSE),
Q: how do I select the optimal hyper-parameter?
- Partition the rows into
train
&test
samples. - For each λ in G, compute the average CV RMSE within X[train,:] using K-fold CV or some other resampling technique.
Note: this means splitting X[train,:] into K-folds.
For each λ I will have a score (CV RMSE). - Select the optimal λ according to some method.
Many select the λ w/ the lowest CV RMSE.
Tibshirani et al recommend the λ corresponding to the most parsimonious model (fewest nonzero predictors) w/ CV RMSE 1 standard deviation above the minimum.
A good ML interface allows the user to specify a technique for selecting the optimal λ. - Using the optimal λ, predict y w/ X[test,:] & compute OOS RMSE(y[test], pred).
It might seem a bit redundant.
CV-RMSE is to avoid overfitting during model training.
The 1sd method (for example) is to avoid overfitting during model selection.