Does DecisionTreeClassifier support missing data in the predictors?

I cannot pass missing data to train DecisionTreeClassifier in MLJ. Is this not supported? Although the machine function gave me a warning, I tried to run fit! and it threw and error that the scitype of X isn’t supported.

1 Like

@Rahul Thanks for reporting. Which DecisionTreeClassifier are you using? The one from DecisionTree.jl is not compatible with missing values. It assumes any feature is ordered.

However, I believe the BetaML version does support missing values but there I see there is a bug in the input scitype declaration which would trigger the warning you are seeing. If you ignore the warning you should be fine, assuming that is the packages you are using. It can be loaded with

BetaMLTree = @load DecisionTreeClassifier pkg=BetaML add=true
tree = BetaMLTree()

I’ve posted an issue here to request fixing the scitype.

cc @sylvaticus

2 Likes

Thank you for your response @ablaom. The BetaML DecisionTree works as expected!

I have one more question. I am getting the same error of unable to fit missing values by using XGBoost XGBoostClassifier(). I checked that BetaML does not provide XGBoostClassifier. Do you know if there are any other packages that allow a dataset with missing values for training XGBoost?

1 Like

I implemented the MLJ interface fix as suggested by ablaom, so it should now work without the warning

However be aware that with missing data the algorithm is much slower. You may consider imputing the missing values with MissingImputator (also in BetaML) and then running the DecisionTree algorithm on non-missing values.

2 Likes