Suggestions for the design of Survey.jl?

Hello,

We are building Survey.jl, a package for statistical estimates from surveys. The inspiration for the package comes from the corresponding R survey package created by Thomas Lumley.

We started out imitating the R API to make the transition from R to Julia easier. Along the way we decided to switch to a more Julia-specific design while still addressing the transition problem by using wrappers for the R-like API or documentation notes. The Julia way, emphasising multiple dispatch, yields clean code and opens the way for bazaar-style development of sub-components by multiple different people.

In the R world there is an entire ecosystem that has arisen on top of the survey package, such as srvyr or svrepmisc. We hope to accomplish the same kind of growth with our package. This is why we bring forward this discussion regarding the design that should be adopted for Survey.jl. The key questions are:

  • How can we make the transition from R to Julia easier?
  • How can the current design be improved to provide a good user experience?
  • Can we improve on the R package in terms of functionality?

Any involvement and suggestion would be of great help, particularly if you have experience with statistics and/or survey analysis.

7 Likes

Nice to hear that you are working on designing a Julian API for that nice package!

My general recommendation would be to try to move as much as possible from using svy-prefixed functions to using generic Julia functions dispatching on Survey.jl objects. For example, svyglm could be replaced with a special glm method when data is a survey design object.

I had also mentioned some ideas about replacing svyby with combine(groupby(...), ...) at https://github.com/xKDR/Survey.jl/issues/4.

One area where you can probably improve on the R package quite easily is that, at least for designs with replicate weights, thanks to the StatisticalModel/RegressionModel interface from StatsAPI, you could support fitting any custom model type defined in a package by calling fit on it with each set of weights and computing standard errors for coefficients automatically based on the coefficients obtained with each set. IIUC this would offer the features of svrepmisc, but also extend it automatically to any new model family, without Survey.jl having to support it explicitly. This is an area where R is often lacking (everything is hardcoded, making it hard to extend to new use cases).

2 Likes

Hi! I’m currently not a Survey.jl user (don’t use R too), but when I opened documentation, I saw than it hard to understand exactly what package do without using R documentation. I think, it will be useful if you add documentation about package purpose and functionalities for only-Julia-users.

1 Like

For the ones coming from SAS, it would be useful to have some links to the main procedures (packages in the Julia/R parlance). I don’t think it is necessary to have fully worked out code examples but high-level directions like “if you were doing regression with surveyreg, then look into XYZ”.
The main SAS procedures are here.

1 Like