Hello, I’m pretty new to machine learning, and I was using XGBoost.jl Regression. I’m in the process of trying to reduce overfitting through k-fold cross validation.
I came across the nfold_cv function, and I was wondering if anyone knew how it worked? I was working off this example:
I was looking up some videos on k-fold cross validation and how it worked in theory, but I’m not sure how I would use this nfold_cv function to draw out-of-fold predictions.
I believe this link explains quite well the idea behind k-fold cross validation.
I don’t know the specific xgboost implementation, but the basic idea is that you use various partitions of the same dataset to divide between training and test sample, and “test” your model, with the specific hyperparameters, on all these different attempts and average the losses, and then you choose the hyperparameters with the lowest averaged loss.
Note that you can separate the problem of finding the best parameters (training of the model, in this case xgboost) with those of finding the best hyperparameters (hyperparameter tuning via cross-validation) using a machine learning library like MLJ or BetaML (disclaimer: I am the author of the second one).
Note that that function seems not to do just cross validation but to already do a hyperparameter tuning (I assume through a grid search) looping over all the parameter space and on each specific combination it apply cross validation to judge its value…