I fitted a regression RF model on random (low dimensional) noise data and got surprising good estimates of the Mean Coeff of Determination using Cross-Validation.
Here is my code, based on the example in https://github.com/bensadeghi/DecisionTree.jl
using Random
using DecisionTree
Random.seed!(2020)
nsamples = 100
nfeatures = 6
# training features and labels
xTR = rand(nsamples,nfeatures)
yTR = rand(nsamples)
# testing features and labels
xTE = rand(nsamples,nfeatures)
yTE = rand(nsamples)
n_subfeatures=round(Int,nfeatures/2); n_trees=50; partial_sampling=0.7; max_depth=-1
min_samples_leaf=1; min_samples_split=2; min_purity_increase=0.0; seed=3
model = build_forest(yTR, xTR,
n_subfeatures,
n_trees,
partial_sampling,
max_depth,
min_samples_leaf,
min_samples_split,
min_purity_increase;
rng = seed)
n_folds=3
r2 = nfoldCV_forest(yTR, xTR,
n_folds,
n_subfeatures,
n_trees,
partial_sampling,
max_depth,
min_samples_leaf,
min_samples_split,
min_purity_increase;
verbose = true,
rng = seed)
yTE_hat = apply_forest(model, xTE)
yTR_hat = apply_forest(model, xTR)
@info(". Coefficient of determination for training $(DecisionTree.R2(yTR, yTR_hat))")
@info(". Coefficient of determination for testing $(DecisionTree.R2(yTE, yTE_hat))")
This gives me
Fold 1
Mean Squared Error: 0.020002234552075074
Correlation Coeff: 0.9623930625920759
Coeff of Determination: 0.7602402973281025
Fold 2
Mean Squared Error: 0.020702135796228545
Correlation Coeff: 0.9464875388483548
Coeff of Determination: 0.7330648421477141
Fold 3
Mean Squared Error: 0.013866320099408993
Correlation Coeff: 0.9433743891893996
Coeff of Determination: 0.7419187614811039
Mean Coeff of Determination: 0.7450746336523069
[ Info: . Coefficient of determination for training 0.751643802875819
[ Info: . Coefficient of determination for testing -0.09547815687242323
I would expect the Mean Coeff of Determination in the 3 folds to be about zero, because my data is just noise. I got more or less this in my unseen test data (~-0.09).
Please, what am I missing in my code?
I am using DecisionTree v0.10.9
Thanks