Strange situation RF Classification: perfect train, test prediction all one category

I am puzzled by an issue with a boolean classification task using RF on a large dimensional dataset (1680 obs x 110 dim) and moderate imbalance (431 vs 1249).
train/test partition is random (0.8,0.2)

When I train the RF algorithm I have almost perfect accuracy/confusion matrix, but when I predict on the test set I have almost all predictions on the same class.

I initially thought ok, it’s just overfitting, but first RF do not overfit too badly, and secondly I have not “random” error predictions, like there is no connection between X and Y, but all prediction are on the same class.
But there isn’t anything specific between train and test sets, whet the hell could be ?

Note that I have this issue using by own RF, but the student originally got the same problem using RF in R…

Here the exact code…

using Pkg
Pkg.activate(@__DIR__)

using Dates, Random, Pipe, HTTP, CSV, DelimitedFiles, DataFrames, Plots, BetaML

Random.seed!(1234)

dataURL = "https://nc.beta-lorraine.fr/s/68iQB56rataBiYZ/download" # ~ 3 MB

data             = @pipe HTTP.get(dataURL).body     |> CSV.File(_, missingstring="NA") |> DataFrame
(n,d)            = size(data)
ycat             = data.class_boolean
fields_toremove1 =  ["class_boolean"]
data             = data[:,Not(fields_toremove1)]
X                = Matrix(data)
((x_train,x_test),(ycat_train,ycat_test)) = partition([X,ycat],[0.8,0.2], shuffle=true)

m = RandomForestEstimator(n_trees=30, force_classification=true, oob=true)
ŷ_train = fit!(m,x_train,ycat_train)
ŷ_train = mode(ŷ_train)
ŷ_test  = predict(m,x_test)
ŷ_test  = mode(ŷ_test)

train_acc         = accuracy(ycat_train, ŷ_train) # 0.998
test_accuracy_est = 1-info(m)["oob_errors"]       # 0.710
test_acc          = accuracy(ycat_test, ŷ_test)   # 0.767

sum(ycat_train)/length(ycat_train) # 0.2619047619047619
sum(ycat_test)/length(ycat_test)   # 0.23511904761904762
sum(ŷ_train)/length(ŷ_train)       # 0.2604166666666667
sum(ŷ_test)/length(ŷ_test)         # 0.044642857142857144

Very odd… I can get top score in training and testing if I rebalance the data with random sampling on the whole dataset (the one that I will then partition in training/testing) but if I rebalance only the training set then the test set predictions are again only on one category…