Balanced Repeated Replication in Julia

Hi All,

I am working with a public data source and I’m having trouble understanding how to compute replicate weights that are needed for calculating the standard errors for my estimates. The data source is here, if you’re interested in reading more about it. I’ve worked before with the American Community Survey from the Census Bureau and they actually compute 80 replicate weights and include them in the data files so I’ve never had to do this step myself.

I’m hoping someone will be kind enough to point me in the right direction as far as how to do this in Julia.

For example, if I want to compute from the survey an estimate of the total number of males in the population, it’s pretty straightforward - I would just filter the data by those identified as males and then take the sum of the weights. But how can I compute replicate weights that will allow me to compute a standard error for the estimate?

The documentation from the data source reads as follows (I’m hoping to figure out how to do this in Julia):

Using the Replicate Weights

Using BRR weights, design-corrected standard errors can be calculated using standard statistical software packages, including SAS, Stata, and R. The statistical procedure involves generating replicates, then calculating the point estimates for each replicate and the variance of the replicate estimates. This variance is the estimated sampling variance of the statistic of interest.

The documentation goes on to show some SAS/STATA/R code that apparently achieves the computation of the replicate weights:

SAS Users

libname naws ‘./naws/data’;

data naws;

set naws.naws_all;

run; /* this database contains the full set of analysis and weight variables */

proc surveymeans data=naws varmethod=BRR(fay=.5);
repweights fay01 fay02 fay03 fay04 fay05 fay06 fay07 fay08 fay09 fay10 fay11 fay12 fay13 fay14 fay15 fay16 fay17 fay18 fay19 fay20 fay21 fay22 fay23 fay24 fay25 fay26 fay27 fay28 fay29 fay30 fay31 fay32 fay33 fay34 fay35 fay36 fay37 fay38 fay39 fay40 fay41 fay42 fay43 fay44 fay45 fay46 fay47 fay48 fay49 fay50 fay51 fay52 fay53 fay54 fay55 fay56 fay57 fay58 fay59 fay60 fay61 fay62 fay63 fay64 fay65 fay66 fay67 fay68 fay69 fay70 fay71 fay72 fay73 fay74 fay75 fay76 fay77 fay78 fay79 fay80;

weight pwtycrd;

var waget1;

where faywtyrs=20152016;


R Users

(The foreign library in R can read Stata files of older format but not SAS files, so we recommend that you first convert the files to Stata format; see the Stata instructions above for a method using SAS tools. If your Stata installation is version 13 or later, use the saveold command instead of save in order to facilitate compatibility. We assume that the current working directory contains Stata files named naws_all.dta

library(foreign) library(survey)

dim(naws_all.dta) # confirm observation and variable counts
type=“Fay”,rho=.5,data= naws_all.dta)
summary(naws.svr) # check that 80 replicates are specified
svymean(~waget1,subset(naws.svr,faywtyrs==20152016), deff=“replace”,na.rm=TRUE)
svymean(~factor(migtype2),subset(naws.svr,faywtyrs==20152016), deff=“replace”,na.rm=TRUE)
svyby(~age,~gender,subset(naws.svr,faywtyrs==20152016),svymean, deff=“replace”,na.rm=TRUE)

It is not clear if this is a Julia problem (implementing or finding a known algorithm), because what want to do is not clear from your description. In any case, StatsBase has some support for various kinds of weights:

My R is a bit rusty, but from the code it looks like they already have weights, and just use them.

I left out a critical piece of the documentation that shows what it is I need to implement in Julia. It’s the re-weighted rth half-sample replicate that I don’t know how to compute…


It is still unclear if you know the algorithm/formulas for this and just need help with the Julia implementation; or you need help with the concept itself. In the latter case, you could look at textbooks and/or open source software (in other languages) that does this.

Unfortunately there’s nothing to handle complex surveys in Julia at this point AFAIK. The weights we support in StatsBase correspond to simpler situations. Your best bet is probably to use the survey R package via RCall.jl.

1 Like

I think that’s what I’m going to have to do. Thanks for the input!