Hello Julia users!
I have been trying to implement a Bootstrap from a panel data, stored in DataFrames. I think I got it working, but I am doing it in a super inefficient way.
For example, as soon as I draw a vector of keys, say
keys=[1,2,1,3,4], I can do
df[keys] to make a DataFrame by automatically duplicating the first row. Is there any way to do this in a grouped DataFrame? Below is my attempt, without doing so:
using DataFrames, CSV, Pipe, Parameters using LinearAlgebra, Statistics, Distributions, Random using StatsBase:sample dfex = DataFrame(pid=repeat([1:4;], inner = ), a = randn(16), b = rand(16)) groupedDF = groupby(dfex, :pid) length(dfex.pid) unique_pid = unique(dfex.pid) n_pid = length(unique_pid) # Withe each pid, pick all the rows with that pid. Bstrap = unique_pid[sample(axes(unique_pid, 1), n_pid; replace = true, ordered = false), 1] length(unique(Bstrap)) function GenBootstrapDF(groupedDF,Bstrap) BootstrapDf = DataFrame() for i in Bstrap Bootstrap_sample = groupedDF[(pid=i,)] BootstrapDf = vcat(BootstrapDf, Bootstrap_sample) end return BootstrapDf end GenBootstrapDF(groupedDF,Bstrap)
This works in the way I wanted, i.e. sampling the pid’s with replacement and then collect the SubDataFrame of the sampled pid’s from the
dfex. However, I would like to improve the performance. Can you give me any ideas, please?
Thank you for your input!