How can i sampling Data Frame?

**Like python data.sample() method**

**NB: replace must need**

Just can just use random row indices like:

```
julia> using DataFrames, Random
julia> df = DataFrame(a = 1:10, b = rand(10))
10Γ2 DataFrame
β Row β a β b β
β β Int64 β Float64 β
βββββββΌββββββββΌβββββββββββ€
β 1 β 1 β 0.180922 β
β 2 β 2 β 0.726072 β
β 3 β 3 β 0.802304 β
β 4 β 4 β 0.769662 β
β 5 β 5 β 0.705299 β
β 6 β 6 β 0.266686 β
β 7 β 7 β 0.332831 β
β 8 β 8 β 0.393075 β
β 9 β 9 β 0.1936 β
β 10 β 10 β 0.830922 β
julia> df[shuffle(1:nrow(df))[1:5], :]
5Γ2 DataFrame
β Row β a β b β
β β Int64 β Float64 β
βββββββΌββββββββΌβββββββββββ€
β 1 β 7 β 0.332831 β
β 2 β 8 β 0.393075 β
β 3 β 1 β 0.180922 β
β 4 β 5 β 0.705299 β
β 5 β 9 β 0.1936 β
```

The `shuffle`

function returns a random ordering of the range from 1 to the number of rows of your dataframe, which you can then index with `[1:x]`

where x is the number of samples you want.

Alternatively, there are ML/stats packages that implement their own way of splitting data into train and test data, like MLJ or Turing - check their docs if thatβs of interest.

need 100 rows data to 1000 sample

Iβm not sure I understand - do you want to sample 100 rows from a 1,000 row `DataFrame`

? Or do you want to draw 1,000 samples of length 100 from a larger data set? My suggestion above can work in both cases, can you clarify what youβre looking for (and what isnβt working for you) ideally by way of a minimal working example?

yes i want 1,000 samples from length 100 data set

Okay to adapt my example from above, you have a length 100 data set:

```
df = DataFrame(a = 1:100, b = rand(100))
```

now we can get 1,000 random samples from this - Iβm assuming each sample has length 10 here:

```
samples = [df[shuffle(1:nrow(df))[1:10], :] for _ in 1:1_000]
```

`samples`

is now a vector of lenght 1,000 which holds a 10-row random sample of your original data set in each location.

Or if youβd like to sample 1,000 rows with replacement:

```
df[rand(1:nrow(df),1000),:]
```

I imagine you are trying to bootstrap data. In addition to the solutions given here, see if `bootstrap.jl`

is a package that works for you.

DependentBootstrap will also work here. One of the options is an iid bootstrap which will do what the OP wants, ie:

```
using DependentBootstrap
dbootdata(mydataframe, numresample=1000, bootmethod=:iid)
```

will return a vector of length `1000`

where each element is a resampled `DataFrame`

.

This is how I split my DataFrame into βtrainingβ and βtestingβ

```
function createTrainTest(df::DataFrame,prop=0.5,randomseed=1234)
df_training = similar(df,0)
df_testing = similar(df,0)
# Now split the df into df_training and df_testing
df_size = size(df,1)
training_proportion = prop
trainingsize = round(df_size*training_proportion)
# Create a random permutation vector
randvec = randperm!(MersenneTwister(randomseed),
Vector{Int64}(undef,df_size))
for k in axes(df)[1]
push!( k β€ trainingsize ?
df_training : df_testing ,
df[randvec[k],:]
)
end
return (df_training,df_testing)
end
```

If you want 1000 samples with each sample having 100 rows then just change the trainingsize to a fixed value of 100 and call the above function 1000 times.

PS: do not forget to use a different randomseed each time!

```
using StatsBase:sample
using DataFrames
df = DataFrame(a = 1:1000)
sample_rows = sample(1:nrow(df), 100, replace=false)
df_sample = df[sample_rows, :]
test_rows = setdiff(1:nrow(df), sample_rows)
df_test = df[test_rows, :]
```