How to sample a clustered group from a DataFrame?

The BootstrapDf = vcat(BootstrapDf, Bootstrap_sample) pattern is generally better written as append!(BootstrapDf, Bootstrap_sample). That will avoid a lot of copying (as internally Julia will anticipate the next call by allocating more space than needed).

Generally it’s also faster to use reduce(vcat, list_of_dfs) as it allows allocating the final data frame upfront. That requires storing a temporary list_of_dfs though, but here they are SubDataFrame views so they are cheap.

Finally, it’s faster/simpler to draw a sample of indices in groupedDF than a sample of pid. In the end something like this should be enough:

reduce(vcat, [groupedDF[i] for i in sample(1:length(groupedDF), n_pid; replace = true, ordered = false)])