Strange situation RF Classification: perfect train, test prediction all one category

I am puzzled by an issue with a boolean classification task using RF on a large dimensional dataset (1680 obs x 110 dim) and moderate imbalance (431 vs 1249).
train/test partition is random (0.8,0.2)

When I train the RF algorithm I have almost perfect accuracy/confusion matrix, but when I predict on the test set I have almost all predictions on the same class.

I initially thought ok, it’s just overfitting, but first RF do not overfit too badly, and secondly I have not “random” error predictions, like there is no connection between X and Y, but all prediction are on the same class.
But there isn’t anything specific between train and test sets, whet the hell could be ?

Note that I have this issue using by own RF, but the student originally got the same problem using RF in R…

Here the exact code…

using Pkg

using Dates, Random, Pipe, HTTP, CSV, DelimitedFiles, DataFrames, Plots, BetaML


dataURL = "" # ~ 3 MB

data             = @pipe HTTP.get(dataURL).body     |> CSV.File(_, missingstring="NA") |> DataFrame
(n,d)            = size(data)
ycat             = data.class_boolean
fields_toremove1 =  ["class_boolean"]
data             = data[:,Not(fields_toremove1)]
X                = Matrix(data)
((x_train,x_test),(ycat_train,ycat_test)) = partition([X,ycat],[0.8,0.2], shuffle=true)

m = RandomForestEstimator(n_trees=30, force_classification=true, oob=true)
ŷ_train = fit!(m,x_train,ycat_train)
ŷ_train = mode(ŷ_train)
ŷ_test  = predict(m,x_test)
ŷ_test  = mode(ŷ_test)

train_acc         = accuracy(ycat_train, ŷ_train) # 0.998
test_accuracy_est = 1-info(m)["oob_errors"]       # 0.710
test_acc          = accuracy(ycat_test, ŷ_test)   # 0.767

sum(ycat_train)/length(ycat_train) # 0.2619047619047619
sum(ycat_test)/length(ycat_test)   # 0.23511904761904762
sum(ŷ_train)/length(ŷ_train)       # 0.2604166666666667
sum(ŷ_test)/length(ŷ_test)         # 0.044642857142857144

Very odd… I can get top score in training and testing if I rebalance the data with random sampling on the whole dataset (the one that I will then partition in training/testing) but if I rebalance only the training set then the test set predictions are again only on one category…