Hi All,
After fixing my one-hot encoding issue I’ve split my code into test and train but I am struggling to get XGBoost’s DMatrix to run. All of my variables are numeric (with the exception of my label, which I did test as numeric to see if that was the issue and I got the same error. At least in R the label needed to be categorical when making a DMatrix).
#train and test set
function partitionTrainTest(data, at = 0.7)
n = nrow(data)
idx = shuffle(1:n)
train_idx = view(idx, 1:floor(Int, at*n))
test_idx = view(idx, (floor(Int, at*n)+1):n)
data[train_idx,:], data[test_idx,:]
end
train,test = partitionTrainTest(df, 0.7) # 70% train
train_X = train[:, Not(1)]
train_Y = train[:, 1]
test_X = test[:, Not(1)]
test_Y = test[:, 1]
#train_Y = DataFrame(label = train_Y)
#test_Y = DataFrame(label = test_Y)
dtrain = DMatrix(train_X , label = train_Y)
I’ve tried the label as a dataframe or an array neither work and provide the same error.
dtrain = DMatrix(train_X, label = train_Y)
ERROR: MethodError: no method matching DMatrix(::DataFrame; label=CategoricalArrays.CategoricalValue{Float64, UInt32}[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0])
I assume I need to somehow turn the dataframe into a matrix but I am not sure exactly how…