Hello, I am new in Julia and I am trying to figure out how to use StratifiedKfold. I am using Julia 0.6 and just Kfold now like:
Xdata=readdlm(“data.txt”)
Ytarget=readdlm(“taget.txt”)
folds = kfolds((Xdata,Ytarget),k=6)
(Xtrain1,Ytrain1),(Xtest1,Ytest1)=folds[1] # the first fold and so on…
My data are a lot of numbers (matrix) and differs between 2 patterns -classes, but there are not labels or some number that tell which pattern is. What I know is that from line 1 to 50 is class 1 and from 51 to 90 is class 2.
But I dont understand how to used
julia; collect(StratifiedKfold([:a, :a, :a, :b, :b, :c, :c, :a, :b, :c], 3))
What are that a, b and c? how I put this in my data? and where I call the Xdata and Ytarget?
A bit unrelated but could I ask why you want to use Stratified KFold here? it seems you have pretty balanced data?
With Sklearn, I think this does the job:
f(i) = ifelse(i<51, 1, 2)
y = [f(i) for i in 1:90]
using ScikitLearn
folds = ScikitLearn.CrossValidation.StratifiedKFold(y, n_folds=10)
X = randn(90, 20) # say 20 features
fold_1 = X[folds[1][1], :]
Hi!! Yes it is almost balanced, but I have other cases that it is not. But anyway, if I use kfolds it does not gives me balanced data. If i dont shuffle my data kfolds gives me all class1 data, if I shuffle than sometimes gives 30% class 2 and 70% class1 and so on.
I have a question, I need to take the Xdata and also the Ytarget because Xdata is my dataset and Ytarget is the labels for each samples. Can I include two matrices in StratifiedKfold? As I am using in kfold? thanks!!
I’ve just started learning Julia,here’s my try , my answer may not be too accurate.
Symbol (:a ,:b and :c )is equivalent to a placeholder, but the length of symbols must be equal to the length of the data. The proportion of : a or : b has little effect on the result, but the number(:a or :b) must be greater than k. like this:
#houses is Array
index_row = [i for i = 1:size(houses)[1]]
index_a = [:a for i = 1:size(houses)[1]*0.5]
index_b = [:b for i = 1:size(houses)[1]*0.5]
index = vcat(index_a,index_b)
rows = collect(StratifiedKfold(index, 10))
# pick data
row = rows[1]
houses[row,:]