Julia as a universal platform for statistical software development

I just posted a paper presenting a vision of using Julia as a universal back end for statistical software development. Do things once in Julia–optimize, maintain, add features–then call from R or Python or Matlab or Stata.

The motivating example is my pair of packages for Stata, which is a commercial statistics program that is popular in the social sciences and elsewhere. julia provides a C-based bridge from Stata to Julia, like JuliaCall for R. It has routines for fast data copying and provides an ersatz Julia REPL inside of Stata. Running on top of it is reghdfejl which mimics a popular Stata program, reghdfe, for fitting linear models with many fixed-effect dummies. The new version presents nearly the same user and programmatic interface but runs ~10X faster on hard problems, by calling FixedEffectModels.jl.

I also present the example of WildBootTests.jl, which is used as a back end for both a Stata and an R package.

I think it will be tough for Julia catch up with R and the like as a home for end users doing statistical analysis. But I think its a great environment for the core work of implementing numerical methods. Why code things separately for R, Python, etc., when we can just do it once, in Julia?

25 Likes

That’s brilliant, thanks for sharing.

I’ve come to the same conclusion independently, as the (pretty much sole right now unfortunately) author of SynthControl.jl I’ve gone through various recent synthetic control method implementations in R, Stata, Matlab, C++ (or Rcpp) and - biased as I am - am convinced that Julia is the natural home for the development of these methods which are often pushed forward by domain experts who prefer coding in interactive languages, but have a computational complexity that means writing a loop in R just isn’t an option.

Unfortunately I think statistics is one of the biggest laggards in Julia adoption since I started using Julia (~2013), with most applied econometrics people seemingly only considering R when looking outside of Stata.

As usual there’s not much use whining about it though, we can just continue to build and do good work, so thanks for doing what you’re doing!

12 Likes

Looks interesting. (I’ll have a careful look tomorrow.)

While I havn’t published any econometrics packages, my financial econometrics tutorial (using a module of functions) isn’t too far from one. Maybe of some interest.

3 Likes

I tried this approach - writing R an python wrappers for one of my julia package (following example of DifferentialEquations.jl). But I feel this is currently not viable. For example, JuliaCall (which calls Julia from R) appears to be abandoned, and even basic installation issues will likely never get resolved. As a result, half the time my collaborators cannot successfully install these wrappers without my help. Also, I feel our community did not documented well how to write your Julia code to make wrapping in R/Python/whatever easy, and I actually had to re-structure a lot of my originally working Julia code to make the wrappers work.

I also thought it’ll be cool to suggest R/Julia rather than R/C++, but in practice, I feel this is more of a dream than reality.

5 Likes

This seems to be more about an annoying Apple macOS “feature” than anything else.

https://discussions.apple.com/thread/253714860?sortBy=best

2 Likes

It looks like JuliaCall is downloading the .dmg version of Julia on macOS. They could try downloading the .tar.gz instead, which might resolve that one specific issue.

Edit: Install Julia from tarball on macOS by DilumAluthge · Pull Request #227 · Non-Contradiction/JuliaCall · GitHub

2 Likes

@Non-Contradiction does seem to quite busy as of late. Last commit there was nine months ago. Maybe @kdpsingh 's group might have an interest in continuing maintenance?

Returning to the original topic, we did have to work through some issues with embedding Julia so there are still some rough points. That said, it was not incredibly difficult either.

The package I helped develop, Bigsimr.jl, is another example of a “backend” for R and Python, and it was developed with R in mind such that passing in real-valued scalars or vectors just works like in R. The thing to keep in mind is that in R everything is a vector and all* numbers are doubles (even integers unless explicitly cast as an integer).

Proof-reading myself I was basically saying exactly the same thing as you, so let me scratch what i wrote and simply +1 your post.

It is much, much easier to communicate to an inferior R process from a Julia process than the other way around. The RCall.jl package is 100% Julia code which calls the C API for R and models the R data structures in Julia. Given that all the internal data structures in R are based on a C struct called SEXPREC (which is a union struct) and most of the C API for R passes and receives such pointers to such structs (called SEXP), it is reasonably straightforward to model these structs as Julia types and to interface with R’s C API.

To try to work in the other direction, starting an inferior Julia process from an R process and communicating with it from R, is much more difficult. You need to write the glue code in C/C++ and even with that I think the JuliaCall package for R still requires RCall.jl to be installed on the Julia side for some of the interfacing of internal types.

The @R_str macro and the “R mode” toggle for the Julia REPL provide the ability to do something like writing R code in the Julia REPL.

I know it is more difficult to convince users to start the Julia REPL and connect to R than to start the R REPL but I still think it is worthwhile encouraging them to do so. Having said that, I will note that I haven’t been as successful convincing users of the lme4 package to R to migrate to the MixedModels.jl package for Julia as I had hoped.

12 Likes

Does JuliaConnectoR work any better? I believe that’s what fwildclusterboot in R uses to call WildBootTests.jl.

3 Likes

It’s a great vision, that I share (for selfish reasons, helps add or improve Julia packages). I would say there’s nothing about the vision limiting it to stats… so you could amend the title, or keep as you wish. Thanks for adding Stata to the list. Even if only viable for it or other language, e.g. Python, then worthwhile. I know you can call easily to Python, I did (then with PyCall; also to MATLAB and Octave), and from Python, and believe it’s very solid with PythonCall, hopefully as good for Stata (does it download Julia for you? and take care of Julia package dependencies?). As explained can call from R, and to (with temporary maintenance issues, should be easily fixed) R.

1 Like

I completely agree it’s not limited to stats. I forget to say that in my post here. (I threw “statistical” in there as a signal of relevance to the professional community that is closest to being my home.)

No the Stata package doesn’t download Julia. That’s a great idea if I can reliably automate it. The barrier I see is the Windows Store installation for juliaup. It can be done with winget install julia -s msstore as a shell command. But when I run that I get prompted to “agree to all the source agreements terms.” Which is enough to impede automatic installation. Right now the documentation just tells people to go to the Windows Store and install Julia, which maybe is good enough.

2 Likes

Just wanted to say that I actually love JuliaConnectoR and provision my package, OMOPCDMCohortCreator.jl as well through the wrapper.

In fact, here is my documentation for how JuliaConnectoR works with my packgae: Using OMOPCDMCohortCreator with R 🏴‍☠️ · OMOPCDMCohortCreator.jl

3 Likes

Maybe you could take a look at how JuliaconectoR and/or JuliaCall (to call julia from R) are doing the installation part ?

2 Likes