I am trying to implement a Group KFold function for my problem, where I need to slit the dataset as a function of some label.
There is no function like that in MLDataUtils or MLbase to my knowledge, so I am trying to implement it manually but it’s actually a mess as I can’t find any equivalent of the Pandas or Numpy isin() functions…
(how do we actually grab all values in one array equal to those in another in Julia???)
 Right now I achieved to write the following function:
function gkfolds(X_, idx_label, k = 5) dd = kfolds(shuffleobs(unique(X_[idx_label,:])), k = k); out =  for j = 1:k train_lab, vald_lab = dd[j] train_idx = Int64 valid_idx = Int64 for i = 1:size(X_,2) if findall(X_[idx_label,i] .== train_lab) !=  push!(train_idx, i) else push!(valid_idx, i) end end push!(out,(X_[:,train_idx],X_[:,valid_idx])) end return out end
I created two loops to check if train_lab was found in X_, and then get indices create a final list containing the folds as tuples. It seems to work but is a bit long…
Does anyone has a clean implementation of such function, or some suggestion to do it?