XGBoost reduce overfitting | k-fold cross validation

Seth16225 · October 8, 2022, 2:08am

Hello, I’m pretty new to machine learning, and I was using XGBoost.jl Regression. I’m in the process of trying to reduce overfitting through k-fold cross validation.

I came across the nfold_cv function, and I was wondering if anyone knew how it worked? I was working off this example:

github.com

dmlc/XGBoost.jl/blob/master/demo/cross_validation.jl

using XGBoost

# load file from text file, also binary buffer generated by xgboost

const DATAPATH = joinpath(@__DIR__, "../data")
dtrain = DMatrix(joinpath(DATAPATH, "agaricus.txt.train"))
dtest = DMatrix(joinpath(DATAPATH, "agaricus.txt.test"))

#Defining parameters for xgboost

param = ["max_depth" => 2,
         "eta" => 1,
         "silent" => 1,
         "objective" => "binary:logistic"]
num_round = 2
nfold = 5

print("running cross validation\n")
# do cross validation, this will print result out as
# [iteration]  metric_name:mean_value+std_value

This file has been truncated. show original

I was looking up some videos on k-fold cross validation and how it worked in theory, but I’m not sure how I would use this nfold_cv function to draw out-of-fold predictions.

Thanks for your help.

sylvaticus · October 8, 2022, 12:25pm

I believe this link explains quite well the idea behind k-fold cross validation.

I don’t know the specific xgboost implementation, but the basic idea is that you use various partitions of the same dataset to divide between training and test sample, and “test” your model, with the specific hyperparameters, on all these different attempts and average the losses, and then you choose the hyperparameters with the lowest averaged loss.

Note that you can separate the problem of finding the best parameters (training of the model, in this case xgboost) with those of finding the best hyperparameters (hyperparameter tuning via cross-validation) using a machine learning library like MLJ or BetaML (disclaimer: I am the author of the second one).

Note that that function seems not to do just cross validation but to already do a hyperparameter tuning (I assume through a grid search) looping over all the parameter space and on each specific combination it apply cross validation to judge its value…

Topic		Replies	Views
A partition and a crossValidation function over arbitrary number of n-dimensional arrays Machine Learning	0	482	April 15, 2021
Custom XGBoost Loss function w/ Zygote. Julia Computing blog post Machine Learning zygote , kaggle	36	4927	April 29, 2020
No variability in xgboost outputs? (XGBoost.jl) Statistics question	10	1065	August 25, 2021
How to implement MLJ models properly for `learning_curve!` Machine Learning mlj	2	1133	December 3, 2019
Learning curve for validation dataset Machine Learning question , mlj , tuning	1	72	August 26, 2024

XGBoost reduce overfitting | k-fold cross validation

Related topics