One thing that is necessary when working with survey data is being able create new variables by standardizing capping existing variables variables. One workflow might look like this:
sysuse auto
local variables_to_cap price mpg headroom trunk weight
foreach variable in `variables_to_cap' {
gen `variable'_std = `variable' // generate a new variable but with a suffix added
sum `variable'_std, detail
replace `variable'_std = r(p99) if `variable'_std > r(p99) & !missing(`variable'_std)
}
Above, the suffixe _std
indicates to the user that we are working with the standardized variable. The benefit of the above code is that you can add the suffix extremely easily, and use the exact same code to generate many standardized variables at once (Ideally, this would be done using a function.)
Using DataFrames
we can’t do this because the way to refer to an existing variable is with a symbol, while the you create a variable
@transform(df, newCol = :oldCol) # Command(?) = Symbol works
@transform(df, symbol("newCol") = :oldCol) # error
@transform(df, :newCol) = :oldCol) # error
Is there a way to do this using DataFrames? Are there any proposals in the works to add this kind of functionality?
There may have been a thread about this recently but I couldn’t find it.